Re: national character sets

28 Apr 2001


      Prof Abimbola Olowofoyeku wrote:
...
On 27 Apr 01, at 23:51, Frank Heckenbach wrote:
[...]
...
...
...
So I conclude that the charset conversion code in unzip is indeed
quite right.
Ok. Perhaps its a font problem with Mandrake 7.1.
Maybe. In this case, it might help to distinguish between a charset
(basically, a mapping between characters and numbers) and a font
(particular shapes of letters etc.).
I have just had a cursory look at the Info-Zip sources. There is no 
portable continuation that I can see, but rather a whole load if IFDEFs,
and, surprise surprise, the win32 conversions use the OEMToxxx API 
calls.
My tentative conclusion is that there is no portable way to do this. I 
guess I have to use IFDEFs and platform-specific calls after all. I 
thought there might be a portable GNU library for this, but there 
obviously isn't one.
Did you expect the GNU libraries to provide support for IBM
charsets? ;-)
I took a very brief look at the Info-Zip source. ISTM that the main
thing are the iso2oem and oem2iso tables in ebcdic.h (and that "OEM"
indeed refers to the original Dos charset, as I had supposed
initially).
So the "portable" answer could be that the Dos charset is quite an
exotic thing outside of Dos systems, and it's of interest only when
file formats (like zip) based on this charset need to be addressed
-- that's another reason why sich conversions are no likely
candidates for general-purpose libraries; they're just not relevant
to so many programs.
Those tables can be taken (probably as-is if the license permits --
I didn't check this) to convert between this charset and latin1 (in
a portable way, since it's only two character tables). latin1 in
itself is reasonably portable and can trivially be converted to
Unicode (the first 256 characters of Unicode are exactly those of
latin1), so that's probably as good as one can get.
Frank
-- 
Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/
GPC To-Do list, latest features, fixed bugs:
http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: national character sets