Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
So basically, I'm just saying that the lack of short strings is seriously hindering GPC adoption on the Mac, and that that should probably be taken in to account when pondering the priority of their implementation.
I see. When I have more than a few days to work at GPC (currently, I'm stilly busy with other work, you know, the paying kind ...), I'll see if I can do it. Unless Waldek beats me to it ...
(Besides Mac it also helps TP/Delphi compat a bit, though I can't quickly remember a single place where Delphi uses them. In 32-bit Delphi it is nearly fully legacy)
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't), there is some use, also in the textmode IDE. Also the compiler and utils itself originally used shortstrings. However we are moving away from that for some time now (already since 2000).
Well, I used BP myself long time ago, and had to port my code to GPC at some time. Short strings were really one of the minor issues, since the BP interfaces are all on the Pascal level, so they simply changed to long strings in GPC. Apart from the rare access to "[0]" or layout in binary files (which I didn't use much anyway), there was nothing much to be changed (except for sometimes declaring a bigger maximum size :-).
In time, only the core compiler itself (symbol tables etc) will be the only non legacy code using shortstrings. This because the size limit is no problem for identifiers/tokens etc, and it is quite a bit faster.
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra,
I suppose you're talking about Delphi's string variant which is only dynamic AFAIR. However, Extended Pascal's strings can be static or (explicitly) dynamic, just like other types, so you can use them mostly the same way as short strings, and there is no extra overhead, except for a few bytes more to store the bigger length field and the capacity.
Note that FPC also advises to use the ansistring type as much as possible. Dynamic string types are the future, static types with larger boundaries will perish also. This because more and more libraries (under influence of GNU policy) specify their maximal string sizes quite high. Putting very large strings on the stack will gigantically inflate stack size.
I think for some cases, limited strings are still useful, for other cases, unlimited strings are preferable. EP strings leave the choice to the programmer (use pointers or not). Also, using unlimited strings doesn't necessarily mean wasting stack size, e.g. when passing a string value by value to an undiscriminated EP string parameter, only the actually size is allocated on the stack (and with "const" parameters, we don't have to copy the string at all, of course). There are more problems when the size is not known in advance (e.g., on input from external sources, or with function results whose size would have to be known before calling the function). In thise case, explicit pointers can be used.
This is true on a high-level, when no external interfaces are a concern. When it comes to binary file formats (as you mentioned), or network protocols, or external binary interfaces (libraries or, like here, the OS), things such as storage size, alignment and byte order matter, and those are not described in Pascal.
Correct. And it is not just the interfaces, but also demoes, documentation and a nearly all Pascal code available on the Mac.
As I said WRT BP above, I think such code, as long as it doesn't use external short-string interfaces, can probably convert to (EP) long strings with minimal effort.
Also keep in mind that for BP (and the Mac Pascal compilers) the UCSD type was pretty much the only one. While BP had a few pchar helper kludges, these were mostly standalone.
Yes, and quite incompatible. OTOH, BP short strings were mostly compatible to standard Pascal strings -- except for the well-known "[0]" issue (*), and binary layout, and some obscure problem with "const" parameters, but for the most part they were source-code compatible.
However FPC and GPC have other automated first tier stringtypes. This means that besides the stuff that TP can do, additional conversions to the native stringtypes are at least an option.
This makes the interface between Carbon UI code or TP legacy code to other parts that use native strings easier. (and avoids that people stick with shortstrings for the rest of their app because of legacy)
In practice this means auto conversion by assignment and passing to value parameters. FPC/Delphi do this already.
We do that for CStrings, so most of the time we can avoid using CStrings in Pascal code this way. An automatic back-conversion (e.g., for function results) also seems possible, though we don't do that yet. Real reference parameters would be a problem, both they are rarely needed (I suppose, also WRT short strings in Mac OS).
So this might indeed be an option for short strings (i.e., no direct support, but automatic conversions). It will certainly be easier to implement, but it lacks some features (such as bianry compatibility), so I might go for the full support anyway, if it doesn't present unsurmountable problems ...
I believe I mentioned this before, but for FPC/Delphi compat it would be absolutely great btw if
setlength(stringtype,newlength) was implemented for all string types as opposite of length(). This allows to keep new string code somewhat type (and compiler) independant.
Of course. We have SetLength for EP strings, and when we add short strings (or any other kinds of strings), we'll certainly extend it to them, too.
The shortstring implementation is simply something like
procedure setlength(var s:shortstring;x:integer);
begin if x>255 then x:=255; s[0]:=chr(x); end;
This only works for strings of capacity 255. The following routines (taken from our "a little bit of GPC compatibility for BP" unit, gpc-bp.pas) work with all capacities. (*) Using them in BP code avoids the "[0]" access, except in SetLength itself, further increasing the source-code compatibility with BP.
{$P+} function GetStringCapacity (var s: String): Integer; { NOTE: the parameter must be var (not const), otherwise BP gets the capacity wrong! } begin GetStringCapacity := High (s) end;
procedure SetLength (var s: String; NewLength: Integer); begin s[0] := Chr (Min (GetStringCapacity (s), Max (0, NewLength))) end;
-- don't know if that's required). This would generally require copying the whole string.
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing. (I admit I almost fell for it myself, but when I saw the drawbacks, I converted back to short strings what I had changed to CStrings already, fortunately, so I could easily convert the code to EP strings with GPC later.)
Copying conversions are required for literals anyway, but they can't be avoided, since shortstrings are not lazily assigned.
I'm not sure if that's required. We could probably also emit the literals as short strings when needed (like we do with literals used as CStrings). We'll see which will be easier in the end.
Frank