Prof Abimbola Olowofoyeku wrote:
On 28 Apr 01, at 4:31, Frank Heckenbach wrote:
[...]
My tentative conclusion is that there is no portable way to do this. I guess I have to use IFDEFs and platform-specific calls after all. I thought there might be a portable GNU library for this, but there obviously isn't one.
Did you expect the GNU libraries to provide support for IBM charsets? ;-)
I can't see the objection to doing so!
It's simply not relevant.
I took a very brief look at the Info-Zip source. ISTM that the main thing are the iso2oem and oem2iso tables in ebcdic.h (and that "OEM" indeed refers to the original Dos charset, as I had supposed initially).
So the "portable" answer could be that the Dos charset is quite an exotic thing outside of Dos systems, and it's of interest only when file formats (like zip) based on this charset need to be addressed -- that's another reason why sich conversions are no likely candidates for general-purpose libraries; they're just not relevant to so many programs.
Well, I am not sure about this. AFAICS all programs need the facility if they are to display or manipulate text written in one language on systems (such as Windows) using another language.
It's not really about languages, but about charsets. Both ISO-8859-1 and this "OEM" charset mostly cover West-European languages (i.e., accented characters etc., rather than Cyrillic, Greek or other letters). The "OEM" charset seems to be a relic from the 80s which has survived through legacy Dos files (and formats such as zip), so outside of Dos/Windoze and programs specifically written to convert old Dos files or access old Dos formats, this charset is not relevant. (Otherwise, how should a program know when it should convert charsets? I.e. if I use some program and give it some input consisting of normal ASCII characters as well as characters
=#$80, I would expect it interpret the characters according to my
default charset and not convert them, unless it has a special reason to, like working with zip files. Plain text files have no indication about the charset used, so if I want to process a file in a foreign charset, I would normally just convert it (`recode ibmpc:lat1' or something).)
What's more relevant on modern systems is converting between the different ISO-8859-n charsets and Unicode, and AFAIK, the GNU library has support for that.
Those tables can be taken (probably as-is if the license permits -- I didn't check this) to convert between this charset and latin1 (in a portable way, since it's only two character tables). latin1 in itself is reasonably portable and can trivially be converted to Unicode (the first 256 characters of Unicode are exactly those of latin1), so that's probably as good as one can get.
Ok - how do you convert these macros into Pascal?
#define ASCII2ISO(c) (((c) & 0x80) ? oem2iso[(c) & 0x7f] : (c)) #define ASCII2OEM(c) (((c) & 0x80) ? iso2oem[(c) & 0x7f] : (c))
if c >= #$80 then Result := OEM2ISO [c] else Result := c
and declare OEM2ISO as #$80 .. #$ff (rather than #0 .. #$7f).
Frank