national character sets

List overview All Threads
Download

newer

older

mirror for GPC for GCC 2.9.5.3

GPC page links

Prof. A Olowofoyeku (The African Chief)

4 Apr 2001 4 Apr '01

5:12 p.m.

Hi all

Are there libraries or routines that are usable with GPC for dealing with character sets? For example the WinAPI has routines such as OemToChar (or OemToAnsi for 16-bit Windows) to deal with converting strings between "oem" character sets and "ansi". I can use these for what I am doing, but I would lose portability while so doing.

Best regards, The Chief --------- Prof. Abimbola Olowofoyeku (The African Chief) Author of Chief's Installer Pro for Win32 Email: African_Chief@bigfoot.com http://www.bigfoot.com/~african_chief/

Show replies by date

Frank Heckenbach

23 Apr 23 Apr

5:25 p.m.

Prof. A Olowofoyeku (The African Chief) wrote:

...

Are there libraries or routines that are usable with GPC for dealing with character sets? For example the WinAPI has routines such as OemToChar (or OemToAnsi for 16-bit Windows) to deal with converting strings between "oem" character sets and "ansi". I can use these for what I am doing, but I would lose portability while so doing.

Does "OEM charset" mean this charset used on Dos originally? I don't think it's in use anywhere else (except within conversion programs etc.), so you probably won't find conversion routines in standard libraries on other systems. (What might be available is conversion between charsets like ISO-8859-n and Unicode (which is trivial for n=1, i.e. latin-1 charset which AFAIK is also the default on (recent?) Windoze versions in west European locales).)

You could try to take the code from programs such as recode. I'm not sure what you're trying to achive -- if you want make a program able to read old Dos charset files because they're still common on Windoze, you can probably just do no conversion on Unix etc. (where normal programs aren't expected to deal with Dos charset files).

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/ GPC To-Do list, latest features, fixed bugs: http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

Prof Abimbola Olowofoyeku

11:14 p.m.

On 23 Apr 01, at 17:25, Frank Heckenbach wrote:

...

Prof. A Olowofoyeku (The African Chief) wrote:

...
Are there libraries or routines that are usable with GPC for dealing with character sets? For example the WinAPI has routines such as OemToChar (or OemToAnsi for 16-bit Windows) to deal with converting strings between "oem" character sets and "ansi". I can use these for what I am doing, but I would lose portability while so doing.

Does "OEM charset" mean this charset used on Dos originally?

No. I am not sure what *precisely* it is supposed to mean, but the idea is to convert between character sets. For example, if you have some text written in Danish, some of the characters would be wrongly displayed in English versions of Windows unless you first convert the text. Here, the Danish text would be the "OEM" characters, and the converted text would be the "ansi" or whatever.

...

I don't think it's in use anywhere else (except within conversion programs etc.), so you probably won't find conversion routines in standard libraries on other systems. (What might be available is conversion between charsets like ISO-8859-n and Unicode (which is trivial for n=1, i.e. latin-1 charset which AFAIK is also the default on (recent?) Windoze versions in west European locales).)

You could try to take the code from programs such as recode. I'm not sure what you're trying to achive -- if you want make a program able to read old Dos charset files because they're still common on Windoze, you can probably just do no conversion on Unix etc. (where normal programs aren't expected to deal with Dos charset files).

What I am trying to do is to get the program to recognise and translate non-English characters correctly. For example, in my unzip code, unless I do this, certain characters in filenames inside the zip file (e.g., umlauts, or accented characters) might end up being wrong, and the file when extracted then get a wrong name (or trying to create it fails because the name contains "illegal" characters).

Best regards, The Chief -------- Prof. Abimbola A. Olowofoyeku (The African Chief) Author of: Chief's Installer Pro for Win32 http://www.bigfoot.com/~African_Chief/chief32.htm Email: African_Chief@bigfoot.com

Frank Heckenbach

24 Apr 24 Apr

3 a.m.

Prof Abimbola Olowofoyeku wrote:

...

On 23 Apr 01, at 17:25, Frank Heckenbach wrote:

...
Prof. A Olowofoyeku (The African Chief) wrote:

...
Are there libraries or routines that are usable with GPC for dealing with character sets? For example the WinAPI has routines such as OemToChar (or OemToAnsi for 16-bit Windows) to deal with converting strings between "oem" character sets and "ansi". I can use these for what I am doing, but I would lose portability while so doing.

Does "OEM charset" mean this charset used on Dos originally?

No. I am not sure what *precisely* it is supposed to mean, but the idea is to convert between character sets. For example, if you have some text written in Danish, some of the characters would be wrongly displayed in English versions of Windows unless you first convert the text. Here, the Danish text would be the "OEM" characters, and the converted text would be the "ansi" or whatever.

...
I don't think it's in use anywhere else (except within conversion programs etc.), so you probably won't find conversion routines in standard libraries on other systems. (What might be available is conversion between charsets like ISO-8859-n and Unicode (which is trivial for n=1, i.e. latin-1 charset which AFAIK is also the default on (recent?) Windoze versions in west European locales).)

You could try to take the code from programs such as recode. I'm not sure what you're trying to achive -- if you want make a program able to read old Dos charset files because they're still common on Windoze, you can probably just do no conversion on Unix etc. (where normal programs aren't expected to deal with Dos charset files).

What I am trying to do is to get the program to recognise and translate non-English characters correctly. For example, in my unzip code, unless I do this, certain characters in filenames inside the zip file (e.g., umlauts, or accented characters) might end up being wrong, and the file when extracted then get a wrong name (or trying to create it fails because the name contains "illegal" characters).

So the system uses different charsets for file names and for text I/O?

You could look at the Info-ZIP code to see if/how it handles the issue under Unix, or try some zip file with problematic characters and see if unzip under Linux does any conversion and if the result is correct. (If you don't have a Linux machine handy, you can send me such a file to try it.)

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/ GPC To-Do list, latest features, fixed bugs: http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

Prof Abimbola Olowofoyeku

10:13 p.m.

On 24 Apr 01, at 3:00, Frank Heckenbach wrote:

[...]

...

...
What I am trying to do is to get the program to recognise and translate non-English characters correctly. For example, in my unzip code, unless I do this, certain characters in filenames inside the zip file (e.g., umlauts, or accented characters) might end up being wrong, and the file when extracted then get a wrong name (or trying to create it fails because the name contains "illegal" characters).

So the system uses different charsets for file names and for text I/O?

To be honest with you, I am not sure. All I know is that a Danish or German word (i.e., with characters that are not in the normal English alphabet) will, if not converted from "OEM" to "Ansi" (or now, to "char"), not display correctly, and, if it is a filename, then the filename will not be correct either. So, to display the text or create the file correctly, you need to do the conversion first.

...

You could look at the Info-ZIP code to see if/how it handles the issue under Unix, or try some zip file with problematic characters and see if unzip under Linux does any conversion and if the result is correct. (If you don't have a Linux machine handy, you can send me such a file to try it.)

unzip fails most miserably under Linux. Of course, "unzip -l" displays the supposed contents of the zip file, but the names are totally wrong (truncated in most cases on encountering the first "foreign" character). "unzip -d" makes a brave attempt to extract the files, but all the "foreign" characters in the filenames are replaced by the "?" character. So the filenames are totally wrong as well. Windows tries to make some sense of the characters - but unzip under Linux doesn't even try (which is what I assume the question marks to mean).

I can send you a sample zip file if you want.

So the question of how to convert these "foreign" characters to something that the OS can understand remains. Like I said before, there is a simple WinAPI routine that does that, but I am trying to find a portable solution from gcc or other (L)GPL libraries.

It may of course be that my Linux (Mandrake 7.1) is broken - but I don't see why that should be so, since everything else works.

George Shapovalov

10:37 p.m.

Are you doing this all in the Mandrak 'konsole', Are you using kde as your windows manager? There is an issue of localisation/using the right fonts in Mandrake. Its default terminal does miserably in that respect and shows 8-bit characters as question marks in many cases. I could not help this but to reinstall kde myself. As for a quick check, try using rxvt instead. Lanch it with such command: rxvt -fn "your-font-with-necessary-charset-here" Be carefull with some versions of xfree and true type fonts, some times character width is miscalculated.

George

Prof Abimbola Olowofoyeku wrote:

...

On 24 Apr 01, at 3:00, Frank Heckenbach wrote:

[...]

...
...
What I am trying to do is to get the program to recognise and translate non-English characters correctly. For example, in my unzip code, unless I do this, certain characters in filenames inside the zip file (e.g., umlauts, or accented characters) might end up being wrong, and the file when extracted then get a wrong name (or trying to create it fails because the name contains "illegal" characters).

So the system uses different charsets for file names and for text I/O?

To be honest with you, I am not sure. All I know is that a Danish or German word (i.e., with characters that are not in the normal English alphabet) will, if not converted from "OEM" to "Ansi" (or now, to "char"), not display correctly, and, if it is a filename, then the filename will not be correct either. So, to display the text or create the file correctly, you need to do the conversion first.

...
You could look at the Info-ZIP code to see if/how it handles the issue under Unix, or try some zip file with problematic characters and see if unzip under Linux does any conversion and if the result is correct. (If you don't have a Linux machine handy, you can send me such a file to try it.)

unzip fails most miserably under Linux. Of course, "unzip -l" displays the supposed contents of the zip file, but the names are totally wrong (truncated in most cases on encountering the first "foreign" character). "unzip -d" makes a brave attempt to extract the files, but all the "foreign" characters in the filenames are replaced by the "?" character. So the filenames are totally wrong as well. Windows tries to make some sense of the characters - but unzip under Linux doesn't even try (which is what I assume the question marks to mean).

I can send you a sample zip file if you want.

So the question of how to convert these "foreign" characters to something that the OS can understand remains. Like I said before, there is a simple WinAPI routine that does that, but I am trying to find a portable solution from gcc or other (L)GPL libraries.

It may of course be that my Linux (Mandrake 7.1) is broken - but I don't see why that should be so, since everything else works.

Best regards, The Chief

Prof. Abimbola A. Olowofoyeku (The African Chief) Author of: Chief's Installer Pro for Win32 http://www.bigfoot.com/~African_Chief/chief32.htm Email: African_Chief@bigfoot.com

Prof Abimbola Olowofoyeku

11:49 p.m.

On 24 Apr 01, at 13:37, George Shapovalov wrote:

...

Are you doing this all in the Mandrak 'konsole',

Yes.

...

Are you using kde as your windows manager?

Yes.

...

There is an issue of localisation/using the right fonts in Mandrake. Its default terminal does miserably in that respect and shows 8-bit characters as question marks in many cases. I could not help this but to reinstall kde myself.

I see :-(

...

As for a quick check, try using rxvt instead. Lanch it with such
command:
rxvt -fn "your-font-with-necessary-charset-here"

I am not sure sure what this means!

George Shapovalov

25 Apr 25 Apr

12:58 a.m.

...

...
As for a quick check, try using rxvt instead. Lanch it with such
command:
rxvt -fn "your-font-with-necessary-charset-here"
I am not sure sure what this means!

-fn "font-specification-as-given-by-xfontsel" is an option for rxvt to use the specified font. You need to pick one which contains the necessary charset (in my case that was koi8-r), so specification would look something like this: "*-koi8-r*" or may be "*fixed-*-koi8-r*" or all the way up to full specification. You can get one by trying output of xlsfonts or xfontsel. Check also 'rxvt --help' or 'man rxvt' for more information on options which rxvt understands.

You may try xterm instead of rxvt (you basically just need any terminal for which you can select what font it should use, I liked rxvt the best). This will require you to modify .Xresources or .Xdefaults. Unfortunately I don't remember off the top of my head the details. I you will need I can dig up the relevant file and send it to you.

George

Frank Heckenbach

1:07 a.m.

Prof Abimbola Olowofoyeku wrote:

...

...
You could look at the Info-ZIP code to see if/how it handles the issue under Unix, or try some zip file with problematic characters and see if unzip under Linux does any conversion and if the result is correct. (If you don't have a Linux machine handy, you can send me such a file to try it.)

unzip fails most miserably under Linux. Of course, "unzip -l" displays the supposed contents of the zip file, but the names are totally wrong (truncated in most cases on encountering the first "foreign" character). "unzip -d" makes a brave attempt to extract the files, but all the "foreign" characters in the filenames are replaced by the "?" character. So the filenames are totally wrong as well. Windows tries to make some sense of the characters - but unzip under Linux doesn't even try (which is what I assume the question marks to mean).

I can send you a sample zip file if you want.

Yes, please.

...

So the question of how to convert these "foreign" characters to something that the OS can understand remains. Like I said before, there is a simple WinAPI routine that does that, but I am trying to find a portable solution from gcc or other (L)GPL libraries.

I can't tell as long as I don't even know what character set this is in the first place.

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/ GPC To-Do list, latest features, fixed bugs: http://agnes.dida.physik.uni-essen.de/~gnu-pascal/todo.html

8844

Age (days ago)

8864

Last active (days ago)

gpc@gnu.de

8 comments

4 participants

tags (0)

participants (4)

Frank Heckenbach
George Shapovalov
Prof Abimbola Olowofoyeku
Prof. A Olowofoyeku (The African Chief)