Markus Gerwinski wrote:
Frank Heckenbach wrote:
Markus Gerwinski wrote:
One more possible solution: In my C layer, I'll transform any wchar_t into its unicode char representation before passing it to the Pascal interface. So there shouldn't be any data loss at all.
What do you mean by "unicode char representation"? I thought wchar_t was just that. Or do you mean the 7/8 bit representation of chars that exist in that charset (ASCII/Latin1/...)? But what about other chars then?
No, wchar_t and unicode are different things in this context. The wchar_t always has a fixed size of 2 resp. 4 bytes. In the unicode representation, every one of them is changed into a variable amount of "normal" characters. So e.g. the wchar_t 'A' is really transformed into a 1-byte-char 'A', whereas some special characters may be transformed into a 2-byte or 3-byte representation (AFAIK 3 bytes is the maximum here, but I'm not sure). See `man wcrtomb' for details on that.
That's UTF-8 (Unicode Transport Format in 8 bit), actually.
OK, if you can afford to process data in UTF-8 on the Pascal side (as Waldek noted, different formats have different advantages), that should be alright.
Frank