Re: national character sets

29 Apr 2001


      Prof Abimbola Olowofoyeku wrote:
...
On 28 Apr 01, at 4:31, Frank Heckenbach wrote:
[...]
...
...
My tentative conclusion is that there is no portable way to do this. I
guess I have to use IFDEFs and platform-specific calls after all. I
thought there might be a portable GNU library for this, but there
obviously isn't one.
Did you expect the GNU libraries to provide support for IBM
charsets? ;-)
I can't see the objection to doing so!
It's simply not relevant.
...
...
I took a very brief look at the Info-Zip source. ISTM that the main
thing are the iso2oem and oem2iso tables in ebcdic.h (and that "OEM"
indeed refers to the original Dos charset, as I had supposed initially).
So the "portable" answer could be that the Dos charset is quite an
exotic thing outside of Dos systems, and it's of interest only when file
formats (like zip) based on this charset need to be addressed -- that's
another reason why sich conversions are no likely candidates for
general-purpose libraries; they're just not relevant to so many
programs.
Well, I am not sure about this. AFAICS all  programs need the facility 
if they are to display or manipulate text written in one language on 
systems (such as Windows) using another language.
It's not really about languages, but about charsets. Both ISO-8859-1
and this "OEM" charset mostly cover West-European languages (i.e.,
accented characters etc., rather than Cyrillic, Greek or other
letters). The "OEM" charset seems to be a relic from the 80s which
has survived through legacy Dos files (and formats such as zip), so
outside of Dos/Windoze and programs specifically written to convert
old Dos files or access old Dos formats, this charset is not
relevant. (Otherwise, how should a program know when it should
convert charsets? I.e. if I use some program and give it some input
consisting of normal ASCII characters as well as characters
...
=#$80, I would expect it interpret the characters according to my
default charset and not convert them, unless it has a special reason
to, like working with zip files. Plain text files have no indication
about the charset used, so if I want to process a file in a foreign
charset, I would normally just convert it (`recode ibmpc:lat1' or
something).)
What's more relevant on modern systems is converting between the
different ISO-8859-n charsets and Unicode, and AFAIK, the GNU
library has support for that.
...
...
Those tables can be taken (probably as-is if the license permits -- I
didn't check this) to convert between this charset and latin1 (in a
portable way, since it's only two character tables). latin1 in itself is
reasonably portable and can trivially be converted to Unicode (the first
256 characters of Unicode are exactly those of latin1), so that's
probably as good as one can get.
Ok - how do you convert these macros into Pascal?
#define ASCII2ISO(c) (((c) & 0x80) ? oem2iso[(c) & 0x7f] : (c))
#define ASCII2OEM(c) (((c) & 0x80) ? iso2oem[(c) & 0x7f] : (c))
if c >= #$80 then
    Result := OEM2ISO [c]
  else
    Result := c
and declare OEM2ISO as #$80 .. #$ff (rather than #0 .. #$7f).
Frank
-- 
Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/
GPC To-Do list, latest features, fixed bugs:
http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: national character sets