Re: wchar

4 Oct 2004


      Scott Moore wrote:
...
Frank Heckenbach wrote:
...
BTW, Scott, you argued partly as if Pascal was based on ASCII. As
you certainly know, this is not required, and in fact, many
non-English speaking countries use an extended charset (e.g.
ISO-8859-n). I do this myself with GPC today (of course, not for
program identifiers, but for `Char' data, just to avoid confusion).
While Latin1 (ISO-8859-1), and only this one, is upward compatible
to Unicode, none of them are compatible to UTF-8 (except for the
ASCII subset), so your compatibility arguments already fall down
here. And in your I/O list, you'd definitely have to add I/O in an 8
bit charset -- where how to select the charset to convert Unicode to
and from is another question, but there must be an (easy) way,
because that's what a large part of the world uses today.
Didn't mean to sound Amerocentric :-)
My understanding of the ISO pages is that the characters outside of 
ASCII are in the > 127
codes. So, for example, IP Pascal specifically leaves the 8th bit 
unmolested, so would I/O
other ISO code pages ok, and would accept ISO pages as source, since it 
treats c < 32 or
c > 127 as characters to be ignored.
That's a valid decision, according to ISO Pascal, but not the one
we've made, or which I personally like. Ignoring control characters
is not exactly my idea, and characters > 127 (usually interpreted in
ISO-8859-n) have been in use for a long time ...
...
UTF-8/Unicode is certainly not compatible with the ISO code page idea, 
but rather replaces
it. So certainly, UTF-8 is designed to be compatible with the ASCII code 
set and no other.
Exactly.
...
My take on it is that I support ISO code pages in the 8 bit mode, and 
Unicode replaces
ISO code pages in the 16 bit mode. Does the upward compatibility suck 
for Europe ?
Not too much in Western Europe, since Latin1 is (intentionally, of
course) a proper subset of Unicode. But still, of course, files in 8
bit Latin1 and UTF-8 are not compatible. There will be both kinds of
files to deal with, apart from 16 bit (perhaps 20 bit, stored as 32
bit) Unicode files, so a full solution will probable have to support
them all.
...
Certainly. I see the resolution of that being Unicode internal 
processing, i.e., "world centered"
code. The beauty of UTF-8 (and other forms) is that nobody has to know 
or care that
my programs are Unicode internally.
Mostly yes. But when, e.g., storing data (even consisting of only
Latin1, but not only ASCII characters) in a file, there is a
difference between UTF-8 and 8 bit coding.
Frank
-- 
Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
NEW! GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE  D101 CD02 4C9D 0FE0 E5E8

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: wchar