Frank Heckenbach wrote:
Gale Paeper wrote:
After examining some of the implementation mechanics in this area, I think there is a fundamental disconnect in the type system created when literal string constants of the one string-element form (e.g., 'c', '''', etc.) are classified as LEX_STRCONST in the lexer. (This probably applies to the Borland #20 and ^I character constant extensions also.)
The language rule distinguishing between char type and string type for literals is based solely upon lexical context.
For literals? String literals of length 1 are just the same as char literals, aren't they?
With a quibble or two, yes. (The quibbling would be over using a more precise terminology to avoid misunderstandings with the apostrophe-image case.)
(BP char literals are not governed by the standards, but it seems reasonable to treat them the same as those, and that seems to be what BP does.)
Seems like a reasonable treatment to me also. But as you say they aren't governed by the standards, so there isn't anything authoritive to fall back on when there problems (vague documentation, buggy implemention(s), etc.) in determining the correct behavior.
In discarding the distinguishment at the lexer level, the information necessary to distinguish between char-type and string-type classification for one char element entities is lost and there is no language rule available outside the lexical context which can reliably be used to reconstruct the lost information.
For literals, AFAICS, the condition length = 1 (or length <= 1 in EP) is all that's required.
Actually, for EP, it is still length = 1. In EP, the length = 0 case is defined to be canonical-string-type. (You get from a null, length = 0, string to a blank char through the blank padding rule in the assignment compatibility rules. I do note that `string_may_be_char' does correctly handle the null string assignment capatibility case.)
In ISO 7185 with the limited constant declaration and string-type capabilities, one probably could deduce (based on the inferences in the constrained constant construction possibilities) the type based on string length outside the lexical context; however, this isn't possible in ISO 10206 since there are a multitude of ways to construct constant one char element entities of string-type which are not also char-types.
So the important difference is between literals and other string constants. In GPC, this is expressed by the flag `PASCAL_TREE_FRESH_CST' (maybe it would be reasonable to rename it to something with `LITERAL'), and that's in fact what I used in the latter fix. So a constant result of `SubStr' etc. now does not have this flag set, and `string_may_be_char' checks this flag.
Thanks for pointing out the `PASCAL_TREE_FRESH_CST' flag. I hadn't picked up on that while wading through the code. If I'm not misundertanding the meaning of it (i.e, the constant string was defined by a source code literal), then the lexical context is sufficiently preserved and therefore enables one to determine whether the (internally represented) string contant is of type char or is of type canonical-string. (Given the flag's useage, my concerns regarding discarding essential lexical information no longer apply.)
Assuming you've gotten all the flag bookkeeping details working correctly, I think the addition of the flag check to `string_may_be_char' yields a good fix for the original Substr problem as well as all the other ordinal type char type versus string-type problems I was seeing.
(With the present GPC limitations in supporting some ISO 10206 constructs for constant declarations, I don't think I can construct working test cases to demonstrate (or check for) problems in this area so this is an observation based upon my inspection the compiler code. I don't profess to have a expert understanding of the compiler internals so I could be mistaken on the Pascal code effects.)
I suppose you mean thing like this (which doesn't work in the next GPC anymore):
program Foo; begin case 'x' of SubStr ('abc', 1, 1) .. 'z': end end.
Something like that only a little more devious in where the defining location is more separated from the using location.
I did come up with a test case which I was able to get through the latest, unpatch, commpiler release without encountering an internal compiler error. Although it contains two errors, the program compiled, ran, and produced an output of 'Fail'.
program CharAndStrTypeTest(input, output);
const kCharA = 'A'; {type char} kAlsoCharA = kCharA; {type char} kStrB = 'B' + ''; {type canonical string}
begin case kStrB of {WRONG} kAlsoCharA: {OK}; kStrB: writeln('Fail'); {WRONG} end; end.
For a more complete coverage of EP's constant string of length one possiblilties, the packed array[1 .. 1] of char case needs to be checked; however, I can't think of a way to get a true Pascal constant defined with that type. For that, you need EP's structured-value-constructors which hasn't been implemented yet in GPC.
Gale Paeper gpaeper@empirenet.com