Forwarded from: marcov(a)stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
(a summary from a Mac outsider. Peter, Adriaan, feel free to correct)
>> ... snip ...
>> >
>> > So basically, I'm just saying that the lack of short strings is
>> > seriously hindering GPC adoption on the Mac, and that that should
>> > probably be taken in to account when pondering the priority of their
>> > implementation.
>
> I see. When I have more than a few days to work at GPC (currently,
> I'm stilly busy with other work, you know, the paying kind ...),
> I'll see if I can do it. Unless Waldek beats me to it ...
(Besides Mac it also helps TP/Delphi compat a bit, though I can't quickly
remember a single place where Delphi uses them. In 32-bit Delphi it is
nearly fully legacy)
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't),
there is some use, also in the textmode IDE. Also the compiler and utils
itself originally used shortstrings. However we are moving away from that
for some time now (already since 2000). In time, only the core compiler
itself (symbol tables etc) will be the only non legacy code using
shortstrings. This because the size limit is no problem for
identifiers/tokens etc, and it is quite a bit faster.
This could be because the static nature allocates them together with the
objects, while dynamic strings would be an allocation extra, but IIRC the
compiler uses pointers to shortstring.
> On a tangent, what do you think is the future of short strings
> anyway?
Dodo.
Note that FPC also advises to use the ansistring type as much as
possible. Dynamic string types are the future, static types with larger
boundaries will perish also. This because more and more libraries (under
influence of GNU policy) specify their maximal string sizes quite high.
Putting very large strings on the stack will gigantically inflate stack
size.
Also more and more OSes allow really deep dir structures and long dir names.
We already have had complaints about a forgotten 1024 char limitation in
filenames in some routine.
> Does Mac OS support those at all, with a different interface, or what?
The base libs under OS X are so *nix that we pretty much use the *BSD rtl
also for Darwin. Carbon has shortstrings mostly (and not just [255] afaik),
and Cocoa is objective C.
> Secondly, charsets. When 16-bit charsets (i.e., Unicode) become common (or
> are already), will/should there be a 16-bit "short" string (i.e., max.
> 65535 characters)?
Don't think so. Don't bother, unless there is something that needs them.
Shortstrings are Carbon and TP. Nearly all of that code is not unicode safe
anyway, and if there is a major investment that will go in the direction of
dynamic strings.
> Or else, if the interfaces will/do use UTF-8, we'll
> need conversions (automatic and/or manual) between Unicode and UTF-8
> anyway?
Yes.
> (As discussed before, UTF-8 doesn't make for a string type in
> Pascal.)
_some_ routines must be adapted for UTF-8. However most routines keep on
working, due to the properties of the encoding.
> Just to figure out if supporting short strings will only be a
> short-lived solution and more work in this area will be required soon
> anyway ...
As far as I see it the problem is that Apple expected more Cocoa uptake
(keep in mind that Apple comes with Java as major subsystem). However that
system is not very friendly for external use, at least from our viewpoint.
While a Cocoa capable GPC would be cool, this will probably be significant
amount of work, and because you'd be the first pascal compiler to do so to
my knowledge, that would mean a slow process, lots of additional work
(converting demo programs etc, document GPC/Pascal specific workarounds etc)
So for now and for some time to come, Pascal on Mac is Carbon, mainly
because the barrier to use Cocoa on a large scale is probably very
significant.
> This is true on a high-level, when no external interfaces are a
> concern. When it comes to binary file formats (as you mentioned), or
> network protocols, or external binary interfaces (libraries or, like
> here, the OS), things such as storage size, alignment and byte order
> matter, and those are not described in Pascal.
Correct. And it is not just the interfaces, but also demoes, documentation
and a nearly all Pascal code available on the Mac.
> BTW, in the case of short strings, they're better described as an
> array, starting from 0, with the Ord of the 0'th element
> representing the length, as e.g. BP does explicitly. A record would
> need an Integer field for the length, and, apart from alignment
> etc., it's not at all certain this is of the same size as a Char,
> even if declared as a subrange (e.g., in GPC it wouldn't be --
> except for a packed record, which in turn wouldn't e.g. allow
> passing the elements by reference which e.g. BP short strings do
> allow).
Also keep in mind that for BP (and the Mac Pascal compilers) the UCSD type
was pretty much the only one. While BP had a few pchar helper kludges, these
were mostly standalone.
However FPC and GPC have other automated first tier stringtypes. This means
that besides the stuff that TP can do, additional conversions to the native
stringtypes are at least an option.
This makes the interface between Carbon UI code or TP legacy code to other
parts that use native strings easier. (and avoids that people stick with
shortstrings for the rest of their app because of legacy)
In practice this means auto conversion by assignment and passing to value
parameters. FPC/Delphi do this already.
I believe I mentioned this before, but for FPC/Delphi compat it would be
absolutely great btw if
setlength(stringtype,newlength) was implemented for all string types as
opposite of length(). This allows to keep new string code somewhat type (and
compiler) independant. The shortstring implementation is simply something
like
procedure setlength(var s:shortstring;x:integer);
begin
if x>255 then x:=255;
s[0]:=chr(x);
end;
> -- don't know if that's required). This would generally require
> copying the whole string.
This is not a problem in general. A hybrid system always has penalties, and
people _choose_ to use it. Mostly subsystems are internally one string type,
and only the interfaces between the subsystems aren't.
Copying conversions are required for literals anyway, but they can't be
avoided, since shortstrings are not lazily assigned.
> Ironically, converting to C-Strings (input parameters only, as long as the
> strings do not contain Chr (0)) is easier, since one can keep the string
> in place and only has to add a Chr (0) if space is reserved in advance (as
> GPC does). That's why we have less of such problems with C-string based OS
> interfaces (POSIX, Dos, Windows, ...), as most strings parameters are
> input, and extra work is only required for the few other cases.
Delphi/FPC Ansistring does the same. But I believe I described it into
detail before already.