At 15:50 +0100 12/3/06, Frank Heckenbach wrote:
Concering Unicode: I am not sure if Unicode strings should be compatible with normal ones. Since normal chars are smaller normal strings and Unicode strings can not be compatible in var parameters. In value context the compiler could generate a conversion, but then we are in messy busines of codepages.
There was a discussion last Jun/Jul in c.l.p.m "Sets and portability", with a large part about Pascal and Unicode, where I stated some of my views. In short, I think there should be a single `Char' type internally, and thus a single kind of strings, in keeping with the standards, where `Char' is always character (not a byte that may be part of a character representation as is `char' in C).
Conversion should be done on I/O (so there should be ways to set the charset on files, probably by default using the standard locale environment variables, plus ways to explicitly set it per file), and by explicit conversion calls. We can have the possibility of building GPC with an 8 bit (as now) or a 32 bit (Unicode) `Char' type, but these might better be complie-time options, resulting, e.g., in two separate complied RTS libraries (built from the same source code, of course), IMHO.
There may not be any point in supporting Unicode any further. From what I've seen as the trend over the last decade, UTF-8 appears to be winning the battle, being both compact in most normal use, 8-bit, and yet supporting the full Unicode range. UTF-8 therefore allows the compiler to continue to ignore the entire issue, except perhaps adding a few support routines (eg, LengthInCharactres) and/or enhancing the RTS runtime routines to support UTF-8 (not much is really needed).
Also, it appears that Unicode as a 16 bit standard is also winning, so 32-bit chars would probably be extreme too.
However, I also think more detailed discussions about Unicode implementation should be postponed until someone actually plans to do anything about it.
Makes sense. Using UTF-8, there isn't really any current problems using GPC and any character set.
Enjoy, Peter.