Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
(a summary from a Mac outsider. Peter, Adriaan, feel free to correct)
... snip ...
So basically, I'm just saying that the lack of short strings is seriously hindering GPC adoption on the Mac, and that that should probably be taken in to account when pondering the priority of their implementation.
I see. When I have more than a few days to work at GPC (currently, I'm stilly busy with other work, you know, the paying kind ...), I'll see if I can do it. Unless Waldek beats me to it ...
(Besides Mac it also helps TP/Delphi compat a bit, though I can't quickly remember a single place where Delphi uses them. In 32-bit Delphi it is nearly fully legacy)
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't), there is some use, also in the textmode IDE. Also the compiler and utils itself originally used shortstrings. However we are moving away from that for some time now (already since 2000). In time, only the core compiler itself (symbol tables etc) will be the only non legacy code using shortstrings. This because the size limit is no problem for identifiers/tokens etc, and it is quite a bit faster.
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra, but IIRC the compiler uses pointers to shortstring.
On a tangent, what do you think is the future of short strings anyway?
Dodo.
Note that FPC also advises to use the ansistring type as much as possible. Dynamic string types are the future, static types with larger boundaries will perish also. This because more and more libraries (under influence of GNU policy) specify their maximal string sizes quite high. Putting very large strings on the stack will gigantically inflate stack size.
Also more and more OSes allow really deep dir structures and long dir names. We already have had complaints about a forgotten 1024 char limitation in filenames in some routine.
Does Mac OS support those at all, with a different interface, or what?
The base libs under OS X are so *nix that we pretty much use the *BSD rtl also for Darwin. Carbon has shortstrings mostly (and not just [255] afaik), and Cocoa is objective C.
Secondly, charsets. When 16-bit charsets (i.e., Unicode) become common (or are already), will/should there be a 16-bit "short" string (i.e., max. 65535 characters)?
Don't think so. Don't bother, unless there is something that needs them. Shortstrings are Carbon and TP. Nearly all of that code is not unicode safe anyway, and if there is a major investment that will go in the direction of dynamic strings.
Or else, if the interfaces will/do use UTF-8, we'll need conversions (automatic and/or manual) between Unicode and UTF-8 anyway?
Yes.
(As discussed before, UTF-8 doesn't make for a string type in Pascal.)
_some_ routines must be adapted for UTF-8. However most routines keep on working, due to the properties of the encoding.
Just to figure out if supporting short strings will only be a short-lived solution and more work in this area will be required soon anyway ...
As far as I see it the problem is that Apple expected more Cocoa uptake (keep in mind that Apple comes with Java as major subsystem). However that system is not very friendly for external use, at least from our viewpoint.
While a Cocoa capable GPC would be cool, this will probably be significant amount of work, and because you'd be the first pascal compiler to do so to my knowledge, that would mean a slow process, lots of additional work (converting demo programs etc, document GPC/Pascal specific workarounds etc)
So for now and for some time to come, Pascal on Mac is Carbon, mainly because the barrier to use Cocoa on a large scale is probably very significant.
This is true on a high-level, when no external interfaces are a concern. When it comes to binary file formats (as you mentioned), or network protocols, or external binary interfaces (libraries or, like here, the OS), things such as storage size, alignment and byte order matter, and those are not described in Pascal.
Correct. And it is not just the interfaces, but also demoes, documentation and a nearly all Pascal code available on the Mac.
BTW, in the case of short strings, they're better described as an array, starting from 0, with the Ord of the 0'th element representing the length, as e.g. BP does explicitly. A record would need an Integer field for the length, and, apart from alignment etc., it's not at all certain this is of the same size as a Char, even if declared as a subrange (e.g., in GPC it wouldn't be -- except for a packed record, which in turn wouldn't e.g. allow passing the elements by reference which e.g. BP short strings do allow).
Also keep in mind that for BP (and the Mac Pascal compilers) the UCSD type was pretty much the only one. While BP had a few pchar helper kludges, these were mostly standalone.
However FPC and GPC have other automated first tier stringtypes. This means that besides the stuff that TP can do, additional conversions to the native stringtypes are at least an option.
This makes the interface between Carbon UI code or TP legacy code to other parts that use native strings easier. (and avoids that people stick with shortstrings for the rest of their app because of legacy)
In practice this means auto conversion by assignment and passing to value parameters. FPC/Delphi do this already.
I believe I mentioned this before, but for FPC/Delphi compat it would be absolutely great btw if
setlength(stringtype,newlength) was implemented for all string types as opposite of length(). This allows to keep new string code somewhat type (and compiler) independant. The shortstring implementation is simply something like
procedure setlength(var s:shortstring;x:integer);
begin if x>255 then x:=255; s[0]:=chr(x); end;
-- don't know if that's required). This would generally require copying the whole string.
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
Copying conversions are required for literals anyway, but they can't be avoided, since shortstrings are not lazily assigned.
Ironically, converting to C-Strings (input parameters only, as long as the strings do not contain Chr (0)) is easier, since one can keep the string in place and only has to add a Chr (0) if space is reserved in advance (as GPC does). That's why we have less of such problems with C-string based OS interfaces (POSIX, Dos, Windows, ...), as most strings parameters are input, and extra work is only required for the few other cases.
Delphi/FPC Ansistring does the same. But I believe I described it into detail before already.
Frank Heckenbach wrote:
Forwarded from: marcov@stack.nl (Marco van de Voort)
While a Cocoa capable GPC would be cool, this will probably be significant amount of work, and because you'd be the first pascal compiler to do so to my knowledge, that would mean a slow process, lots of additional work (converting demo programs etc, document GPC/Pascal specific workarounds etc)
Some work has already has been done by the GNU Modula team http://en.wikipedia.org/wiki/Objective_Modula-2.
Regards,
Adriaan van Os
Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
So basically, I'm just saying that the lack of short strings is seriously hindering GPC adoption on the Mac, and that that should probably be taken in to account when pondering the priority of their implementation.
I see. When I have more than a few days to work at GPC (currently, I'm stilly busy with other work, you know, the paying kind ...), I'll see if I can do it. Unless Waldek beats me to it ...
(Besides Mac it also helps TP/Delphi compat a bit, though I can't quickly remember a single place where Delphi uses them. In 32-bit Delphi it is nearly fully legacy)
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't), there is some use, also in the textmode IDE. Also the compiler and utils itself originally used shortstrings. However we are moving away from that for some time now (already since 2000).
Well, I used BP myself long time ago, and had to port my code to GPC at some time. Short strings were really one of the minor issues, since the BP interfaces are all on the Pascal level, so they simply changed to long strings in GPC. Apart from the rare access to "[0]" or layout in binary files (which I didn't use much anyway), there was nothing much to be changed (except for sometimes declaring a bigger maximum size :-).
In time, only the core compiler itself (symbol tables etc) will be the only non legacy code using shortstrings. This because the size limit is no problem for identifiers/tokens etc, and it is quite a bit faster.
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra,
I suppose you're talking about Delphi's string variant which is only dynamic AFAIR. However, Extended Pascal's strings can be static or (explicitly) dynamic, just like other types, so you can use them mostly the same way as short strings, and there is no extra overhead, except for a few bytes more to store the bigger length field and the capacity.
Note that FPC also advises to use the ansistring type as much as possible. Dynamic string types are the future, static types with larger boundaries will perish also. This because more and more libraries (under influence of GNU policy) specify their maximal string sizes quite high. Putting very large strings on the stack will gigantically inflate stack size.
I think for some cases, limited strings are still useful, for other cases, unlimited strings are preferable. EP strings leave the choice to the programmer (use pointers or not). Also, using unlimited strings doesn't necessarily mean wasting stack size, e.g. when passing a string value by value to an undiscriminated EP string parameter, only the actually size is allocated on the stack (and with "const" parameters, we don't have to copy the string at all, of course). There are more problems when the size is not known in advance (e.g., on input from external sources, or with function results whose size would have to be known before calling the function). In thise case, explicit pointers can be used.
This is true on a high-level, when no external interfaces are a concern. When it comes to binary file formats (as you mentioned), or network protocols, or external binary interfaces (libraries or, like here, the OS), things such as storage size, alignment and byte order matter, and those are not described in Pascal.
Correct. And it is not just the interfaces, but also demoes, documentation and a nearly all Pascal code available on the Mac.
As I said WRT BP above, I think such code, as long as it doesn't use external short-string interfaces, can probably convert to (EP) long strings with minimal effort.
Also keep in mind that for BP (and the Mac Pascal compilers) the UCSD type was pretty much the only one. While BP had a few pchar helper kludges, these were mostly standalone.
Yes, and quite incompatible. OTOH, BP short strings were mostly compatible to standard Pascal strings -- except for the well-known "[0]" issue (*), and binary layout, and some obscure problem with "const" parameters, but for the most part they were source-code compatible.
However FPC and GPC have other automated first tier stringtypes. This means that besides the stuff that TP can do, additional conversions to the native stringtypes are at least an option.
This makes the interface between Carbon UI code or TP legacy code to other parts that use native strings easier. (and avoids that people stick with shortstrings for the rest of their app because of legacy)
In practice this means auto conversion by assignment and passing to value parameters. FPC/Delphi do this already.
We do that for CStrings, so most of the time we can avoid using CStrings in Pascal code this way. An automatic back-conversion (e.g., for function results) also seems possible, though we don't do that yet. Real reference parameters would be a problem, both they are rarely needed (I suppose, also WRT short strings in Mac OS).
So this might indeed be an option for short strings (i.e., no direct support, but automatic conversions). It will certainly be easier to implement, but it lacks some features (such as bianry compatibility), so I might go for the full support anyway, if it doesn't present unsurmountable problems ...
I believe I mentioned this before, but for FPC/Delphi compat it would be absolutely great btw if
setlength(stringtype,newlength) was implemented for all string types as opposite of length(). This allows to keep new string code somewhat type (and compiler) independant.
Of course. We have SetLength for EP strings, and when we add short strings (or any other kinds of strings), we'll certainly extend it to them, too.
The shortstring implementation is simply something like
procedure setlength(var s:shortstring;x:integer);
begin if x>255 then x:=255; s[0]:=chr(x); end;
This only works for strings of capacity 255. The following routines (taken from our "a little bit of GPC compatibility for BP" unit, gpc-bp.pas) work with all capacities. (*) Using them in BP code avoids the "[0]" access, except in SetLength itself, further increasing the source-code compatibility with BP.
{$P+} function GetStringCapacity (var s: String): Integer; { NOTE: the parameter must be var (not const), otherwise BP gets the capacity wrong! } begin GetStringCapacity := High (s) end;
procedure SetLength (var s: String; NewLength: Integer); begin s[0] := Chr (Min (GetStringCapacity (s), Max (0, NewLength))) end;
-- don't know if that's required). This would generally require copying the whole string.
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing. (I admit I almost fell for it myself, but when I saw the drawbacks, I converted back to short strings what I had changed to CStrings already, fortunately, so I could easily convert the code to EP strings with GPC later.)
Copying conversions are required for literals anyway, but they can't be avoided, since shortstrings are not lazily assigned.
I'm not sure if that's required. We could probably also emit the literals as short strings when needed (like we do with literals used as CStrings). We'll see which will be easier in the end.
Frank
Frank Heckenbach wrote:
Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
I think for some cases, limited strings are still useful, for other cases, unlimited strings are preferable. EP strings leave the choice to the programmer (use pointers or not).
I quite agree to that, the choice should be to the programmer.
Regards,
Adriaan van Os
On 4 Jul 2006 at 14:56, Frank Heckenbach wrote: [....]
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra,
I suppose you're talking about Delphi's string variant which is only dynamic AFAIR.
I believe he is talking about AnsiStrings, not string variants.
[...]
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing.
They were unavoidable for WinAPI programming.
Best regards, The Chief -------- Prof. Abimbola A. Olowofoyeku (The African Chief) web: http://www.greatchief.plus.com/
Prof A Olowofoyeku (The African Chief) wrote:
On 4 Jul 2006 at 14:56, Frank Heckenbach wrote: [....]
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra,
I suppose you're talking about Delphi's string variant which is only dynamic AFAIR.
I believe he is talking about AnsiStrings, not string variants.
That's what I meant. I didn't mean "variant" as in "variant record" (which would make no sense here), but as in a language variant, or dialect.
BTW, I refuse to call them AnsiStrings myself, since AFAIK no ANSI standard describes these strings, certainly not the ANSI Pascal standards.
[...]
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing.
They were unavoidable for WinAPI programming.
The question (as just discussed WRT short strings on Mac OS and CStrings also on Unix) is whether to use them throughout the Pascal program, or to convert strings for the interfaces. Borland could have done the latter hidden in the units, and never (or hardly ever) even exposed them to Pascal programmers. Unfortunately they did the former, perhaps in a misguided attempt to overcome the 255 chars limitation (which they did, but at the cost of big programming discomfort). But that's all past and we can't change it. Fortunately, with short strings, these kinds of issues are smaller, since the source code changes (between short strings and EP strings) are less severe.
Frank