Waldek Hebisch wrote:
Adriaan van Os wrote:
Waldek Hebisch wrote:
Consider the following program:
program fstr(output); type sat = packed array [1..16] of 'K'..'O'; var sa : sat; begin sa := 'OK'; writeln('Sizeof(sa) = ', Sizeof(sa)); writeln(sa) end .
ATM it "works". However, it seems that this program is illegal. Namely ISO says:
: Each value of a string-type shall be structured as a one-to-one : mapping from an index-domain to a set of components possessing : the char-type
And later it again speaks about `the char-type'. So it seems that subranges of char type are _not_ allowed as component types of strings.
There is a (maybe somewhat curious) definition of strings in the ISO 7185 Pascal Report (third edition) in section 6.2:
"An array type is called a *string type* if it is packed, if it has as its component type the predefined type *Char* and if it has as its index type a subrange of *Integer* from 1 to n, for n greater than 1"
The report cleary speaks of "the predefined type Char". But, if I am correct, the definition seems to imply that we also have to reject (in iso modes) the following
{$standard-pascal} program teststr( Output); var s: packed array[ 1..1] of char; begin s:= '?'; writeln( s) end.
Yes, thanks for test program. The restriction is unnatural and EP allows the program above, but AFAICS it is illegal in ISO 7185.
Indeed, seems so. I suppose the intention was that 'x' is a char literal and not a string literal, and excluding the case above was an accident, but for full compatibility I guess we should issue an error, but perhaps speaking of a "an obscure ISO 7185 Pascal restriction" (cf. the message for `{ ... *)' comments).
Form implementation point of view subranges can be packed more tightly then char type, so such restriction make some sense.
Now, the question is what shall we do? And do you agree with my reading of the standard? Shall we disallow the program above. Or maybe accept it as an extension, but report an error in standard mode.
The latter, I would say. Will there be any consequences when GPC starts to support Unicode character sets ?
I have now disallowed such things. If one wants to accept such things there is some real work to do. Namely, we also accepted:
program fstr1; type sat = packed array [1..16] of 'K'..'O' value [otherwise 'O']; var sa : sat; begin sa := '111'; end .
To handle this properly one would have to implement special range checking routine. Also both programs crash with 4.0 backend due to type mismatch (one would have to add a conversion).
Previously I was undecided, but this argument convinces me that we should forbid such "strings" (as you did). I can't see a useful purpose of them, leave alone one that justifies the additional complications.
Concering Unicode: I am not sure if Unicode strings should be compatible with normal ones. Since normal chars are smaller normal strings and Unicode strings can not be compatible in var parameters. In value context the compiler could generate a conversion, but then we are in messy busines of codepages.
There was a discussion last Jun/Jul in c.l.p.m "Sets and portability", with a large part about Pascal and Unicode, where I stated some of my views. In short, I think there should be a single `Char' type internally, and thus a single kind of strings, in keeping with the standards, where `Char' is always character (not a byte that may be part of a character representation as is `char' in C).
Conversion should be done on I/O (so there should be ways to set the charset on files, probably by default using the standard locale environment variables, plus ways to explicitly set it per file), and by explicit conversion calls. We can have the possibility of building GPC with an 8 bit (as now) or a 32 bit (Unicode) `Char' type, but these might better be complie-time options, resulting, e.g., in two separate complied RTS libraries (built from the same source code, of course), IMHO.
When `Char' is a Unicode type, one could, of course, declare a subrange `Chr (0) .. Chr (255)', and possibly make it 8-bit if we allow the `Size' attribute for chars. But it would only be Latin1 chars (by the definition of Unicode), and given that Latin1 only works for a few (though major) languages, but e.g. not for the Euro symbol (which requires Latin9 instead), I don't think a special facility for such Latin1-8-bit-strings with internal conversions is justified. (That's referring to the original question -- for charsets other than Latin1 or plain ASCII (Chr (0) .. Chr (127)), a correspnding char type wouldn't be a subrange of Unicode `Char' at all, since the "high" characters are encoded differently.)
However, I also think more detailed discussions about Unicode implementation should be postponed until someone actually plans to do anything about it.
Frank