Re: Strings and standard Pascal

12 Mar 2006


      Waldek Hebisch wrote:
...
Adriaan van Os wrote:
...
Waldek Hebisch wrote:
...
Consider the following program:
program fstr(output);
type sat = packed array [1..16] of 'K'..'O';
var sa : sat;
begin
  sa := 'OK';
  writeln('Sizeof(sa) = ', Sizeof(sa));
  writeln(sa)
end
.
ATM it "works". However, it seems that this program is illegal.
Namely ISO says:
: Each value of a string-type shall be structured as a one-to-one
: mapping from an index-domain to a set of components possessing
: the char-type
And later it again speaks about `the char-type'. So it seems that
subranges of char type are _not_ allowed as component types of strings.
There is a (maybe somewhat curious) definition of strings in the ISO 
7185 Pascal Report (third edition) in section 6.2:
"An array type is called a *string type* if it is packed, if it has as 
its component type the predefined type *Char* and if it has as its 
index type a subrange of *Integer* from 1 to n, for n greater than 1"
The report cleary speaks of "the predefined type Char". But, if I am 
correct, the definition seems to imply that we also have to reject (in 
iso modes) the following
{$standard-pascal}
program teststr( Output);
var s: packed array[ 1..1] of char;
begin
   s:= '?';
   writeln( s)
end.
Yes, thanks for test program. The restriction is unnatural and EP allows
the program above, but AFAICS it is illegal in ISO 7185.
Indeed, seems so. I suppose the intention was that 'x' is a char
literal and not a string literal, and excluding the case above was
an accident, but for full compatibility I guess we should issue an
error, but perhaps speaking of a "an obscure ISO 7185 Pascal
restriction" (cf. the message for `{ ... *)' comments).
...
...
...
Form implementation point of view subranges can be packed more tightly
then char type, so such restriction make some sense.
Now, the question is what shall we do? And do you agree with my reading
of the standard? Shall we disallow the program above. Or maybe
accept it as an extension, but report an error in standard mode.
The latter, I would say. Will there be any consequences when GPC starts 
to support Unicode character sets ?
I have now disallowed such things. If one wants to accept such things
there is some real work to do. Namely, we also accepted:
program fstr1;
type sat = packed array [1..16] of 'K'..'O' value [otherwise 'O'];
var sa : sat;
begin
  sa := '111';
end
.
To handle this properly one would have to implement special range
checking routine. Also both programs crash with 4.0 backend due
to type mismatch (one would have to add a conversion).
Previously I was undecided, but this argument convinces me that we
should forbid such "strings" (as you did). I can't see a useful
purpose of them, leave alone one that justifies the additional
complications.
...
Concering Unicode: I am not sure if Unicode strings should be
compatible with normal ones. Since normal chars are smaller
normal strings and Unicode strings can not be compatible in
var parameters. In value context the compiler could generate
a conversion, but then we are in messy busines of codepages.
There was a discussion last Jun/Jul in c.l.p.m "Sets and
portability", with a large part about Pascal and Unicode, where I
stated some of my views. In short, I think there should be a single
`Char' type internally, and thus a single kind of strings, in
keeping with the standards, where `Char' is always character (not a
byte that may be part of a character representation as is `char'
in C).
Conversion should be done on I/O (so there should be ways to set the
charset on files, probably by default using the standard locale
environment variables, plus ways to explicitly set it per file), and
by explicit conversion calls. We can have the possibility of
building GPC with an 8 bit (as now) or a 32 bit (Unicode) `Char'
type, but these might better be complie-time options, resulting,
e.g., in two separate complied RTS libraries (built from the same
source code, of course), IMHO.
When `Char' is a Unicode type, one could, of course, declare a
subrange `Chr (0) .. Chr (255)', and possibly make it 8-bit if we
allow the `Size' attribute for chars. But it would only be Latin1
chars (by the definition of Unicode), and given that Latin1 only
works for a few (though major) languages, but e.g. not for the Euro
symbol (which requires Latin9 instead), I don't think a special
facility for such Latin1-8-bit-strings with internal conversions is
justified. (That's referring to the original question -- for
charsets other than Latin1 or plain ASCII (Chr (0) .. Chr (127)), a
correspnding char type wouldn't be a subrange of Unicode `Char' at
all, since the "high" characters are encoded differently.)
However, I also think more detailed discussions about Unicode
implementation should be postponed until someone actually plans to
do anything about it.
Frank
-- 
Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE  D101 CD02 4C9D 0FE0 E5E8

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Strings and standard Pascal