Re: Getting an ifstream for a file with unicode chars in the file name
"Zapanaz" wrote:
I am wondering though, I know this is an impossible question to
make a
definitive answer to, but roughly, how likely do you think it
would be
that a file on a user's hard drive would not be representable in
their
current code page?
I think that nowadays it's pretty likely. With Windows XP/2K
everywhere the code page is almost irrelevant. For example, I
never change my machine's regional settings. It's always US
English both for user and system locale. However, I have a lot of
files in "My Documents" that have non-English names.
With system as 2K/XP, which supports Unicode natively people just
don't care about "system friendly" names anymore. They name files
and folders just as they name physical files in their native
language. And there are more non-English users out there in the
world than English speakers/computer geeks.
Moreover, even among English speakers there are enough pedants
that write "co?perative" or "fianc?". This castrated charset that
we used to employ a couple of last decades in computers (and
before then in telegraphs) is finally fades away. People again
write their texts as they were always used to, with funny
characters and in many languages.
how are the file names actually stored by the
file system, UTF-16 in NTFS?
Yes, NTFS stores a filename as a sequence of 16-bit values.
However, no validation of actual UTF-16 conformance is performed.
Filename may contain any value (except the reserved ones).
If file names are
stored in UTF-16, there could definitely be file names that
could not
be represented in the user's code page.
You're correct.
Alex