Prof Abimbola Olowofoyeku wrote:
On 27 Apr 01, at 23:51, Frank Heckenbach wrote:
[...]
So I conclude that the charset conversion code in unzip is indeed quite right.
Ok. Perhaps its a font problem with Mandrake 7.1.
Maybe. In this case, it might help to distinguish between a charset (basically, a mapping between characters and numbers) and a font (particular shapes of letters etc.).
I have just had a cursory look at the Info-Zip sources. There is no portable continuation that I can see, but rather a whole load if IFDEFs, and, surprise surprise, the win32 conversions use the OEMToxxx API calls.
My tentative conclusion is that there is no portable way to do this. I guess I have to use IFDEFs and platform-specific calls after all. I thought there might be a portable GNU library for this, but there obviously isn't one.
Did you expect the GNU libraries to provide support for IBM charsets? ;-)
I took a very brief look at the Info-Zip source. ISTM that the main thing are the iso2oem and oem2iso tables in ebcdic.h (and that "OEM" indeed refers to the original Dos charset, as I had supposed initially).
So the "portable" answer could be that the Dos charset is quite an exotic thing outside of Dos systems, and it's of interest only when file formats (like zip) based on this charset need to be addressed -- that's another reason why sich conversions are no likely candidates for general-purpose libraries; they're just not relevant to so many programs.
Those tables can be taken (probably as-is if the license permits -- I didn't check this) to convert between this charset and latin1 (in a portable way, since it's only two character tables). latin1 in itself is reasonably portable and can trivially be converted to Unicode (the first 256 characters of Unicode are exactly those of latin1), so that's probably as good as one can get.
Frank