Re: Reading an array from file?
On Aug 10, 11:16 pm, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
In article <4f75c139-2e29-4870-8df3-
[ ... ]
In practice, of course, there's still a lot of non-Unicode
floating around as well, and not all Unicode files contain a
BOM, so things get more complicated. Even when limiting
myself to Unicode, I'll read the first four bytes---if
there's a BOM, fine, but even if there's not, I'll look for
0x00 bytes, if the position and number correspond to one of
the UTF-16 or UTF-32 formats, assuming the first two
characters have a Unicode encoding of less than 0xFF, I'll
assume that format. It's not guaranteed, and will almost
certainly fail if I get a file with Chinese text and no BOM,
but it works often enough to be worthwhile. At least in my
environment (where files with Chinese text are very rare).
When you start doing work for Windows, you'll probably want to
look at IsTextUnicode(). It does roughly the kind of guessing
you describe above, but IIRC, it looks at something like 8K of
text instead of four bytes.
My code should still be portable, when possible. I'd also like
it to work from streamed input---even a four character buffer
introduces significant complications into my code. And the
documentation of the "results" suggests that it really only
looks for UTF-16.
The current version of Visual Studio also seems to work
fine with UTF-8 and UTF-16 (BE & LE) text files as well.
It preserves the BOM and endianess when saving a modified
version -- but if you want to use it to create a new file
with UTF-16BE encoding (for example) that might be a bit
more difficult (I haven't tried to very hard, but I don't
immediately see a "Unicode big endian" option like Notepad
I'm afraid I can't help you there. (Now if it were vim...)
But it sounds like Microsoft is being inconsistent.
Well, sort of. Then again, the programs are enough different
in general that consistency between them would be a bit like
consistency between a skateboard and a delivery truck -- they
both have four wheels, but almost everything else is quite
What I meant was more general. On one hand, Windows seems to
tend toward UTF-16; on the other Visual Studios doesn't allow
you to create it.
James Kanze (GABI Software) email:email@example.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34