At 13:16 +0100 13/3/06, Frank Heckenbach wrote:
Peter N Lewis wrote:
There may not be any point in supporting Unicode any further. From
I don't agree. It's not only Length (which is defined by the (mis-)using them as UTF-8 bytes, but then you're on your own if Length, SubStr/Copy, Index/Pos etc. behave strangely.)
Actually, with UTF-8, there is rarely any issues with Length, SubStr, Copy, Index, or Pos.
With UTF-8:
* Assuming valid UTF-8 strings, Pos will never mis-match. * Length returns the "size" of the string. Given UTF-8, there must be two different functions, one to return the size of the string in chars - which you call "Length" is personal preference. * Searching for an ASCII character will always work as expected. * SubStr/Copy require valid indexes and length, but the result will be explicitly either correct, or an invalid UTF-8 string.
For example, if you have a search string, a replace string, and a source string, the exact same code using Pos and Copy will work for ASCII and for UTF-8, assuming all the strings are valid ASCII or valid UTF-8 respectively.
Handling case insensitively is more entertaining of course, but then it's already rarely handled well even with just ISO-8859-1.
Anyway, if someone things Unicode32 is worth implementing in the RTS, go for it, I'd just suggest that it's becoming less and less relevant.
Enjoy, Peter.