Ulrich Eckhardt <>
Mon, 27 Aug 2007 09:33:51 +0200
Jeff???Relf wrote:

Just to prove my point, you ( Mr. Eckhardt ) have a Big-Byte-First box,
but your code only reads in Little-Byte-First UTF-16 files.

No. I do have a big-endian machine at home. The code I was talking about is
here at work. And at work, we also only support ISO8859-1 and UTF-16le for
backward compatibility. I believe that XML suggests some encoding a parser
should support, not sure about that though.

My code only reads in ??? UTF-16 Little-Byte-First ???,
and UTF-8, where the ??? Magic ??? first bytes of headerless files are:
??? const wchar_t Magic_UTF_16 = 0xFeFF ;
  const uchar Magic_UTF_8[] = { 0xeF, 0xbb, 0xbF }; ???.

I'm not sure what you mean here, in particular why you are calling
things 'magic' instead of the standard BOM and 'little-byte-first' instead
of little-endian.

And you won't find B.B.F. UTF-16 or surrorgate pairs out in the wild.

Probably that's true. The only systems even using UTF-16 or UCS2 are MS
Windows systems, and those don't work on any big-endian machines. Thinking
about it, I believe Java requires one of those encodings for internal use,
I'm not sure how that affects files written with it...


