Frank Heckenbach wrote:
BTW, Scott, you argued partly as if Pascal was based on ASCII. As you certainly know, this is not required, and in fact, many non-English speaking countries use an extended charset (e.g. ISO-8859-n). I do this myself with GPC today (of course, not for program identifiers, but for `Char' data, just to avoid confusion). While Latin1 (ISO-8859-1), and only this one, is upward compatible to Unicode, none of them are compatible to UTF-8 (except for the ASCII subset), so your compatibility arguments already fall down here. And in your I/O list, you'd definitely have to add I/O in an 8 bit charset -- where how to select the charset to convert Unicode to and from is another question, but there must be an (easy) way, because that's what a large part of the world uses today.
Didn't mean to sound Amerocentric :-)
My understanding of the ISO pages is that the characters outside of ASCII are in the > 127 codes. So, for example, IP Pascal specifically leaves the 8th bit unmolested, so would I/O other ISO code pages ok, and would accept ISO pages as source, since it treats c < 32 or c > 127 as characters to be ignored.
UTF-8/Unicode is certainly not compatible with the ISO code page idea, but rather replaces it. So certainly, UTF-8 is designed to be compatible with the ASCII code set and no other.
My take on it is that I support ISO code pages in the 8 bit mode, and Unicode replaces ISO code pages in the 16 bit mode. Does the upward compatibility suck for Europe ? Certainly. I see the resolution of that being Unicode internal processing, i.e., "world centered" code. The beauty of UTF-8 (and other forms) is that nobody has to know or care that my programs are Unicode internally.