Re: FourCharCode

8 Jul 2003


      Frank Heckenbach wrote:
...
Gale Paeper wrote:
...
After examining some of the implementation mechanics in this area, I
think there is a fundamental disconnect in the type system created when
literal string constants of the one string-element form (e.g., 'c',
'''', etc.) are classified as LEX_STRCONST in the lexer.  (This probably
applies to the Borland #20 and ^I character constant extensions also.)
The language rule distinguishing between char type and string type for
literals is based solely upon lexical context.
For literals? String literals of length 1 are just the same as char
literals, aren't they?
With a quibble or two, yes.  (The quibbling would be over using a more
precise terminology to avoid misunderstandings with the apostrophe-image case.)
...
(BP char literals are not governed by the
standards, but it seems reasonable to treat them the same as those,
and that seems to be what BP does.)
Seems like a reasonable treatment to me also.  But as you say they
aren't governed by the standards, so there isn't anything authoritive to
fall back on when there problems (vague documentation, buggy
implemention(s), etc.) in determining the correct behavior.
...
...
In discarding the
distinguishment at the lexer level, the information necessary to
distinguish between char-type and string-type classification for one
char element entities is lost and there is no language rule available
outside the lexical context which can reliably be used to reconstruct
the lost information.
For literals, AFAICS, the condition length = 1 (or length <= 1 in
EP) is all that's required.
Actually, for EP, it is still length = 1.  In EP, the length = 0 case is
defined to be canonical-string-type.  (You get from a null, length = 0,
string to a blank char through the blank padding rule in the assignment
compatibility rules.  I do note that `string_may_be_char' does correctly
handle the null string assignment capatibility case.)
...
...
In ISO 7185 with the limited constant declaration
and string-type capabilities, one probably could deduce (based on the
inferences in the constrained constant construction possibilities) the
type based on string length outside the lexical context; however, this
isn't possible in ISO 10206 since there are a multitude of ways to
construct constant one char element entities of string-type which are
not also char-types.
So the important difference is between literals and other string
constants. In GPC, this is expressed by the flag
`PASCAL_TREE_FRESH_CST' (maybe it would be reasonable to rename it
to something with `LITERAL'), and that's in fact what I used in the
latter fix. So a constant result of `SubStr' etc. now does not have
this flag set, and `string_may_be_char' checks this flag.
Thanks for pointing out the `PASCAL_TREE_FRESH_CST' flag.  I hadn't
picked up on that while wading through the code.  If I'm not
misundertanding the meaning of it (i.e, the constant string was defined
by a source code literal), then the lexical context is sufficiently
preserved and therefore enables one to determine whether the (internally
represented) string contant is of type char or is of type
canonical-string.  (Given the flag's useage, my concerns regarding
discarding essential lexical information no longer apply.)
Assuming you've gotten all the flag bookkeeping details working
correctly, I think the addition of the flag check to
`string_may_be_char' yields a good fix for the original Substr problem
as well as all the other ordinal type char type versus string-type
problems I was seeing.
...
...
(With
the present GPC limitations in supporting some ISO 10206 constructs for
constant declarations, I don't think I can construct working test cases
to demonstrate (or check for) problems in this area so this is an
observation based upon my inspection the compiler code.  I don't profess
to have a expert understanding of the compiler internals so I could be
mistaken on the Pascal code effects.)
I suppose you mean thing like this (which doesn't work in the next
GPC anymore):
program Foo;
begin
  case 'x' of
    SubStr ('abc', 1, 1) .. 'z':
  end
end.
Something like that only a little more devious in where the defining
location is more separated from the using location.
I did come up with a test case which I was able to get through the
latest, unpatch, commpiler release without encountering an internal
compiler error.  Although it contains two errors, the program compiled,
ran, and produced an output of 'Fail'.
program CharAndStrTypeTest(input, output);
const
    kCharA = 'A'; {type char}
    kAlsoCharA = kCharA; {type char}
    kStrB = 'B' + ''; {type canonical string}
begin
case kStrB of {WRONG}
    kAlsoCharA: {OK};
    kStrB: writeln('Fail'); {WRONG}
end;
end.
For a more complete coverage of EP's constant string of length one
possiblilties, the packed array[1 .. 1] of char case needs to be
checked; however, I can't think of a way to get a true Pascal constant
defined with that type.  For that, you need EP's
structured-value-constructors which hasn't been implemented yet in GPC.
Gale Paeper
gpaeper@empirenet.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: FourCharCode