Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't), there is some use, also in the textmode IDE. Also the compiler and utils itself originally used shortstrings. However we are moving away from that for some time now (already since 2000).
Well, I used BP myself long time ago, ......... was nothing much to be changed (except for sometimes declaring a bigger maximum size :-).
I had similar experiences with my own codebases, because they were already abstracted for string usage since they were ported back from Modula2 before.
However in my experience, there only is a small fraction of existing codebases that is that clean. See e.g. the average level of filth in SWAG.
In time, only the core compiler itself (symbol tables etc) will be the only non legacy code using shortstrings. This because the size limit is no problem for identifiers/tokens etc, and it is quite a bit faster.
This could be because the static nature allocates them together with the objects, while dynamic strings would be an allocation extra,
I suppose you're talking about Delphi's string variant which is only dynamic AFAIR.
Correct, and specifically inside the FPC compiler core. The FPC compiler is speedwise mostly dependant on the memory manager and lexing.
In plain Delphi (and FPC in de delphi mode), "String" means ansistring. While in TP and (FPC in TP modes) the identifier "String" means shortstring.
This makes switching easy, provided the shortstring code is cleaned up (no write beyond the range 1..length(s) and no s[0] use).
However, Extended Pascal's strings can be static or (explicitly) dynamic, just like other types, so you can use them mostly the same way as short strings, and there is no extra overhead, except for a few bytes more to store the bigger length field and the capacity.
If the statical variant is instantiated with the class/object, yes then it would.
But as said, there is no real reason to clean shortstring out of the compiler core. Maybe except if we switch to unicode (or UTF8) mangled names and identifiers, that will be a major rewrite anyway.
I think for some cases, limited strings are still useful, for other cases, unlimited strings are preferable.
Correct. However EP now has this static type. And FPC also has one, reasonably transparant. The point is more that it is not worth to invest in a second one unless there are very good reasons (and the current mac pascal discussion is one for GPC IMHO)
Correct. And it is not just the interfaces, but also demoes, documentation and a nearly all Pascal code available on the Mac.
As I said WRT BP above, I think such code, as long as it doesn't use external short-string interfaces, can probably convert to (EP) long strings with minimal effort.
It is a barrier. And to a programmer that comes from a all-hands-held commercial compiler, it is yet another one over the already existing ones (the change of IDE, project system, misc units and language changes). That's what Peter tries to point out.
For a new developer everything is new, and _every_ change that forces him to inspect every line in the whole source is a major pain and a risk. Even if every change for itself is negiable.
I learned this the hard way during FPC Delphi compability convergence. You quickly port and fix heaps of all those little issues, and then when it finalyl compiles it turns out the program doesn't work anymore. The littler the changes the less chance on such scenario.
However you and me and probably the rest on this list can handle decent debuggers and if necessary probably even GDB down to assembler level if we have to. Most of the Apple newbies can't though.
In practice this means auto conversion by assignment and passing to value parameters. FPC/Delphi do this already.
We do that for CStrings, so most of the time we can avoid using CStrings in Pascal code this way. An automatic back-conversion (e.g., for function results) also seems possible, though we don't do that yet. Real reference parameters would be a problem, both they are rarely needed (I suppose, also WRT short strings in Mac OS).
Similarly with FPC/Delphi and ansistrings <->pchars. Btw, function results are handled by the conversion?
So this might indeed be an option for short strings (i.e., no direct support, but automatic conversions). It will certainly be easier to implement, but it lacks some features (such as bianry compatibility), so I might go for the full support anyway, if it doesn't present unsurmountable problems ...
I meant these conversions as additional to the BP featureset, to interoperate with your own string type. And afaik they already have some workarounds like that using the records and some operator overloading? (Peter?)
I don't think you can do without full support. Specially the "open array" string compat is too important to leave out (more important than the ref thing), because this is the main way to have generic shortstring string routines.
The shortstring implementation is simply something like
procedure setlength(var s:shortstring;x:integer);
begin if x>255 then x:=255; s[0]:=chr(x); end;
This only works for strings of capacity 255.
Correct, my shortstringese is a bit rusty. Subst 255 with high(s) for a general TP compatible solution.
This is not a problem in general. A hybrid system always has penalties, and people _choose_ to use it. Mostly subsystems are internally one string type, and only the interfaces between the subsystems aren't.
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing.
I think more that it was the only way to get around the size limits. Also the TP/BP windows compat was directer on the API than the VCL based Delphi.
But there certainly was a bit of a myth in TP times that pchars were faster, partially also because they were from "C".
There was some truth in that (shortstrings copy too much), and carefully crafted code could be better, but the speed advantage was vastly exagerated, specially compared with the fact that it was way easier to do anything wrong with pchars, and complex string code was way more work.
The problems were that the emerging 32-bit C compilers BP was compared to were simply faster because they optimized and were 32-bit (using a 32-bit move() routine for their copies), and BP was a codegenerator from yesteryear.
(I admit I almost fell for it myself, but when I saw the drawbacks, I converted back to short strings what I had changed to CStrings already, fortunately, so I could easily convert the code to EP strings with GPC later.)
I mostly used shortstrings in that time and pchars only to break that limit if needed. However when that really got important, because the path sizes exploded, I was already using FPC and ansistrings were stable.
Copying conversions are required for literals anyway, but they can't be avoided, since shortstrings are not lazily assigned.
I'm not sure if that's required. We could probably also emit the literals as short strings when needed (like we do with literals used as CStrings). We'll see which will be easier in the end.
I meant the following situation:
procedure x;
var s : string;
begin s:='bolalalallala'; end;
On the assignment some copy must follow from where the const is stored to the stack.
Frank Heckenbach wrote:
Forwarded from: marcov@stack.nl (Marco van de Voort)
In gmane.comp.compilers.gpc, you wrote:
FPC is a like Delphi, but since it supports the TP libs (Delphi doesn't), there is some use, also in the textmode IDE. Also the compiler and utils itself originally used shortstrings. However we are moving away from that for some time now (already since 2000).
Well, I used BP myself long time ago, ......... was nothing much to be changed (except for sometimes declaring a bigger maximum size :-).
I had similar experiences with my own codebases, because they were already abstracted for string usage since they were ported back from Modula2 before.
I think we're talking about two diffrent things. Of course, with abstracted string access, you can change to any model rather easily. What I meant was that BP short strings and EP long strings are mostly source-level compatible (expcept for "[0]" and such things), so even with non-abstracted code (such as most of mine), changing is quite easy.
However in my experience, there only is a small fraction of existing codebases that is that clean. See e.g. the average level of filth in SWAG.
Haven't looked at SWAG for quite a while. Do people still write a new CRT replacement unit every week? ;-) Well, I guess in some cases, even "[0]" is so abundant to make it painful (or probably even more dirty tricks relying on the memory layout are not uncommon) ...
Correct. However EP now has this static type. And FPC also has one, reasonably transparant. The point is more that it is not worth to invest in a second one unless there are very good reasons (and the current mac pascal discussion is one for GPC IMHO)
Yes, I think so.
In practice this means auto conversion by assignment and passing to value parameters. FPC/Delphi do this already.
We do that for CStrings, so most of the time we can avoid using CStrings in Pascal code this way. An automatic back-conversion (e.g., for function results) also seems possible, though we don't do that yet. Real reference parameters would be a problem, both they are rarely needed (I suppose, also WRT short strings in Mac OS).
Similarly with FPC/Delphi and ansistrings <->pchars. Btw, function results are handled by the conversion?
No, for CString function results, one needs to write CString2String in order to use them as Pascal strings. But we might do this automatically (optionally) in the future.
I meant these conversions as additional to the BP featureset, to interoperate with your own string type. And afaik they already have some workarounds like that using the records and some operator overloading? (Peter?)
Yes, AFAIK. Of course, when we build in a short string type, we need at least automatic conversions from/to the other supported Pascal string types.
I don't think you can do without full support. Specially the "open array" string compat is too important to leave out (more important than the ref thing), because this is the main way to have generic shortstring string routines.
For binary-level BP compatibility, sure. But as I said, when source compatibility suffices, one can just use EP strings there, because EP schematic parameters (i.e., simply "String") behave mostly like BP open-string parameters (which are also declared simply "String", with a compiler directive that can simply be ignored or omitted in GPC).
For Mac OS, AIUI it's about OS interfaces, and so far I haven't heard they need open strings there.
But I'll probably still go for the full solution anyway.
I hope so. OTOH, I've seen in the past a lot of BP programmers use CStrings (i.e., "PChar") throughout in their Pascal code, after they were added to BP (version 6 or 7), probably also because of Borland's marketing them as the next big thing.
I think more that it was the only way to get around the size limits.
Yes, it was (except hand-made). But since Borland wrote the compiler, they could have provided another way. ;-)
But there certainly was a bit of a myth in TP times that pchars were faster, partially also because they were from "C".
There was some truth in that (shortstrings copy too much), and carefully crafted code could be better, but the speed advantage was vastly exagerated, specially compared with the fact that it was way easier to do anything wrong with pchars, and complex string code was way more work.
Agreed. Though properly crafted code can usually avoid unnecessary copying with short strings and EP strings as well. (One can always use pointers to strings, which of course, open up possibilities for memory leaks, but not most other problems CString have.)
The problems were that the emerging 32-bit C compilers BP was compared to were simply faster because they optimized and were 32-bit (using a 32-bit move() routine for their copies), and BP was a codegenerator from yesteryear.
Of course, such comparisons are rubbish.
(I admit I almost fell for it myself, but when I saw the drawbacks, I converted back to short strings what I had changed to CStrings already, fortunately, so I could easily convert the code to EP strings with GPC later.)
I mostly used shortstrings in that time and pchars only to break that limit if needed. However when that really got important, because the path sizes exploded, I was already using FPC and ansistrings were stable.
s/FPC/GPC/;s/"ansistrings"/EP strings/ Me too.
PS: I'll be out of town for a while. I'll probably have occasional email access, but I may not take part in this discussion further. I think the important points have been said.
Frank