Re: wchar

4 Oct 2004


      Frank Heckenbach wrote:
...
Waldek Hebisch wrote:
... snip ...
...
...
I do not like the concept of "compilation modes". Unicode is not
merely a huge collections of code-points -- trurly Unicode aware
program is likely to use different algorithms and data structures.
Sure, many program will work without changes, but so using 8-bit
bytes and UTF-8.
I agree. All units/libraries would have to exist in both
compilation modes (and would need to be tested twice), etc. And, of
course, once other issues appear that "want" to have compilation
modes, we get to have 2^n copies of libraries which is even less
practical.
Not if the internal char type is 32 bit.  Once that decision is
made the only problem is how to supply and describe the file
interfaces, and (of secondary importance) interconnection with C
coding.  I think we can already eliminate the need for compact char
internal representation for embedded systems, because the fixed
overhead is already monstrous.
...
...
Note also that full Unicode is 20 bits (and because of combining
chars still you cannot indentify characters with Unicode code-
points). Glibc normally uses 32-bit 'wchar_t', which in another
argument for 32-bit chars.
I also agree. If anything then either 32 bits immediately (or just
match C's wchar_t, which the backend seems to provide us -- I
still have to check in detail).
...
It is "good Wirth/Pascal" tradition to offer only one size. But
IHMO it is _not_ GNU Pascal tradition.
Of course, even Wirth's Pascal has integer subranges, so if we
could define ASCII char as a subrange of Unicode char, we might be
fine. But this would mean (a) `Char' would be Unicode (the standard
type must be the biggest if we want to follow the standard's
"spirit" here) which is too big a change I fear, (b) it would not
cater for a UTF8 type (in particular such a string type) which is
neither an array of ASCII chars or of Unicode chars.
So I think we should (must) leave `Char' as it is. Besides the
usual suspects -- binary files and other protocols which depend on
data type layout -- changing `Char' would mean breaking most
programs that handle text(!) files with 7 bit (i.e., ASCII) and 8
bit charsets. We can't realistically do that.
Binary file interfaces can be simply handled with a subrange of
integer, such as "TYPE byte = 0..255;". This means the system has
to adjust storage usage to the cardinality of the subranges.  UTF8
need never be an internal format, as long as routines are provided
to convert between UTF8 strings and internal Unicode strings.
The amount of storage dedicated to a char need not affect the
programs.  My PascalP of 20 years ago used 16 bit storage on the
HP3000, and 8 bit on byte addressing machines.  Of course the
HP3000 didn't actually use any chars requiring over 8 bits.  Both
machines generated identical output from identical input.
Using wide internal char storage caters to future machines that are
not byte addressing, or that, in C terms, have a much larger byte.
...
... snip ...
...
...
I personally think that we may use such type as Pascal wide
character type, but if that is too controversial, then I propose
to just add an interfacing type now and postpone the question of
proper Unicode support.
I'd also postpone it (since we've both probably have enough other
things to do first), but implementing such a type (such as,
layout compatible to C's `wchar_t', abstract ordinal type,
meaningful name) might be reasonable now ...
I agree that postponement is in order. Full range-checking should
have priority.  However things should be done with a view to future
paths.  This may well include two types of text files, say text and
atext (and maybe utext).  The narrower forms can be supplied by
subrange definitions at the outer 0 scope level, making them easily
customizable.
-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   http://cbfalconer.home.att.net  USE worldnet address!

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: wchar