Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
(a summary from a Mac outsider. Peter, Adriaan, feel free to correct)
(Besides Mac it also helps TP/Delphi compat a bit, though I can't quickly remember a single place where Delphi uses them. In 32-bit Delphi it is nearly fully legacy)
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't), there is some use, also in the textmode IDE. Also the compiler and utils itself originally used shortstrings. However we are moving away from that for some time now (already since 2000). In time, only the core compiler itself (symbol tables etc) will be the only non legacy code using shortstrings. This because the size limit is no problem for identifiers/tokens etc, and it is quite a bit faster.
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra, but IIRC the compiler uses pointers to shortstring.
On a tangent, what do you think is the future of short strings anyway?
Dodo.
Note that FPC also advises to use the ansistring type as much as possible. Dynamic string types are the future, static types with larger boundaries will perish also. This because more and more libraries (under influence of GNU policy) specify their maximal string sizes quite high. Putting very large strings on the stack will gigantically inflate stack size.
Also more and more OSes allow really deep dir structures and long dir names. We already have had complaints about a forgotten 1024 char limitation in filenames in some routine.
Does Mac OS support those at all, with a different interface, or what?
The base libs under OS X are so *nix that we pretty much use the *BSD rtl also for Darwin. Carbon has shortstrings mostly (and not just [255] afaik), and Cocoa is objective C.
Don't think so. Don't bother, unless there is something that needs them. Shortstrings are Carbon and TP. Nearly all of that code is not unicode safe anyway, and if there is a major investment that will go in the direction of dynamic strings.
Yes.
(As discussed before, UTF-8 doesn't make for a string type in Pascal.)
_some_ routines must be adapted for UTF-8. However most routines keep on working, due to the properties of the encoding.
As far as I see it the problem is that Apple expected more Cocoa uptake (keep in mind that Apple comes with Java as major subsystem). However that system is not very friendly for external use, at least from our viewpoint.
While a Cocoa capable GPC would be cool, this will probably be significant amount of work, and because you'd be the first pascal compiler to do so to my knowledge, that would mean a slow process, lots of additional work (converting demo programs etc, document GPC/Pascal specific workarounds etc)
So for now and for some time to come, Pascal on Mac is Carbon, mainly because the barrier to use Cocoa on a large scale is probably very significant.
Correct. And it is not just the interfaces, but also demoes, documentation and a nearly all Pascal code available on the Mac.
Also keep in mind that for BP (and the Mac Pascal compilers) the UCSD type was pretty much the only one. While BP had a few pchar helper kludges, these were mostly standalone.
However FPC and GPC have other automated first tier stringtypes. This means that besides the stuff that TP can do, additional conversions to the native stringtypes are at least an option.
This makes the interface between Carbon UI code or TP legacy code to other parts that use native strings easier. (and avoids that people stick with shortstrings for the rest of their app because of legacy)
In practice this means auto conversion by assignment and passing to value parameters. FPC/Delphi do this already.
I believe I mentioned this before, but for FPC/Delphi compat it would be absolutely great btw if
setlength(stringtype,newlength) was implemented for all string types as opposite of length(). This allows to keep new string code somewhat type (and compiler) independant. The shortstring implementation is simply something like
procedure setlength(var s:shortstring;x:integer);
begin if x>255 then x:=255; s[0]:=chr(x); end;
-- don't know if that's required). This would generally require copying the whole string.
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
Copying conversions are required for literals anyway, but they can't be avoided, since shortstrings are not lazily assigned.
Delphi/FPC Ansistring does the same. But I believe I described it into detail before already.
Frank Heckenbach wrote:
Forwarded from: marcov@stack.nl (Marco van de Voort)
Some work has already has been done by the GNU Modula team http://en.wikipedia.org/wiki/Objective_Modula-2.
Regards,
Adriaan van Os
Well, I used BP myself long time ago, and had to port my code to GPC at some time. Short strings were really one of the minor issues, since the BP interfaces are all on the Pascal level, so they simply changed to long strings in GPC. Apart from the rare access to "[0]" or layout in binary files (which I didn't use much anyway), there was nothing much to be changed (except for sometimes declaring a bigger maximum size :-).
I suppose you're talking about Delphi's string variant which is only dynamic AFAIR. However, Extended Pascal's strings can be static or (explicitly) dynamic, just like other types, so you can use them mostly the same way as short strings, and there is no extra overhead, except for a few bytes more to store the bigger length field and the capacity.
I think for some cases, limited strings are still useful, for other cases, unlimited strings are preferable. EP strings leave the choice to the programmer (use pointers or not). Also, using unlimited strings doesn't necessarily mean wasting stack size, e.g. when passing a string value by value to an undiscriminated EP string parameter, only the actually size is allocated on the stack (and with "const" parameters, we don't have to copy the string at all, of course). There are more problems when the size is not known in advance (e.g., on input from external sources, or with function results whose size would have to be known before calling the function). In thise case, explicit pointers can be used.
As I said WRT BP above, I think such code, as long as it doesn't use external short-string interfaces, can probably convert to (EP) long strings with minimal effort.
Yes, and quite incompatible. OTOH, BP short strings were mostly compatible to standard Pascal strings -- except for the well-known "[0]" issue (*), and binary layout, and some obscure problem with "const" parameters, but for the most part they were source-code compatible.
We do that for CStrings, so most of the time we can avoid using CStrings in Pascal code this way. An automatic back-conversion (e.g., for function results) also seems possible, though we don't do that yet. Real reference parameters would be a problem, both they are rarely needed (I suppose, also WRT short strings in Mac OS).
So this might indeed be an option for short strings (i.e., no direct support, but automatic conversions). It will certainly be easier to implement, but it lacks some features (such as bianry compatibility), so I might go for the full support anyway, if it doesn't present unsurmountable problems ...
Of course. We have SetLength for EP strings, and when we add short strings (or any other kinds of strings), we'll certainly extend it to them, too.
This only works for strings of capacity 255. The following routines (taken from our "a little bit of GPC compatibility for BP" unit, gpc-bp.pas) work with all capacities. (*) Using them in BP code avoids the "[0]" access, except in SetLength itself, further increasing the source-code compatibility with BP.
{$P+} function GetStringCapacity (var s: String): Integer; { NOTE: the parameter must be var (not const), otherwise BP gets the capacity wrong! } begin GetStringCapacity := High (s) end;
procedure SetLength (var s: String; NewLength: Integer); begin s[0] := Chr (Min (GetStringCapacity (s), Max (0, NewLength))) end;
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing. (I admit I almost fell for it myself, but when I saw the drawbacks, I converted back to short strings what I had changed to CStrings already, fortunately, so I could easily convert the code to EP strings with GPC later.)
Copying conversions are required for literals anyway, but they can't be avoided, since shortstrings are not lazily assigned.
I'm not sure if that's required. We could probably also emit the literals as short strings when needed (like we do with literals used as CStrings). We'll see which will be easier in the end.
Frank
On 4 Jul 2006 at 14:56, Frank Heckenbach wrote: [....]
I believe he is talking about AnsiStrings, not string variants.
[...]
They were unavoidable for WinAPI programming.
Best regards, The Chief -------- Prof. Abimbola A. Olowofoyeku (The African Chief) web: http://www.greatchief.plus.com/
Prof A Olowofoyeku (The African Chief) wrote:
That's what I meant. I didn't mean "variant" as in "variant record" (which would make no sense here), but as in a language variant, or dialect.
BTW, I refuse to call them AnsiStrings myself, since AFAIK no ANSI standard describes these strings, certainly not the ANSI Pascal standards.
The question (as just discussed WRT short strings on Mac OS and CStrings also on Unix) is whether to use them throughout the Pascal program, or to convert strings for the interfaces. Borland could have done the latter hidden in the units, and never (or hardly ever) even exposed them to Pascal programmers. Unfortunately they did the former, perhaps in a misguided attempt to overcome the 255 chars limitation (which they did, but at the cost of big programming discomfort). But that's all past and we can't change it. Fortunately, with short strings, these kinds of issues are smaller, since the source code changes (between short strings and EP strings) are less severe.
Frank