Re: TCHAR * to WCHAR*

From:

Ulrich Eckhardt <eckhardt@satorlaser.com>

Newsgroups:

microsoft.public.vc.language

Date:

Fri, 07 Dec 2007 09:29:57 +0100

Message-ID:

<7tcn25-vip.ln1@satorlaser.homedns.org>

Ben Voigt [C++ MVP] wrote:

"Alex Blekhman" <tkfx.REMOVE@yahoo.com> wrote in message
news:uK7CXfCOIHA.4076@TK2MSFTNGP03.phx.gbl...

"Ben Voigt [C++ MVP]" wrote:

The ANSI standard includes mbtowc and wctomb.

So, I'm out of excuses. :) The only rason I can think of is that
`std::string' was devised long before mbtowc/wctomb were standardized.

That's not true either.

The real reason is that std::string is a template instantiation of
std::basic_string<char>, and std::wstring of std::basic_string<wchar_t>,
and there's not a good way to express the conversion in a generic,
template-friendly way.

AFAIK, basic_string<char> and basic_string<wchar_t> are the only two
instantiations that must be supported, so a mere declaration of the base
template class would be enough plus specialisations that are allowed to
contain anything, i.e. not necessarily have equivalent interfaces.

So I would expect to find some loose functions in
the global namespace for the conversion, not member functions of
std::basic_string.

True, but they're not there.

Also, there is the big problem that while 'wchar_t' typically represents
Unicode codepoints (well, with the exception of win32 where it contains
UTF-16 elements) while the meaning of 'char' differs _a_ _lot_, so you
can't provide a 'generic' conversion that makes more sense than
std::copy().

Note that you could use the current locale's charset, but that would only
lead to errors, only because a program is run in a different locale it
doesn't suddenly change the meaning of internal string constants, not even
the meaning of input (in particular from a file suddenly changes), so it is
actually good to make people think about what they are doing instead of
helping them do the wrong thing.

BTW: The mentioned std::codecvt is also not the correct (or intended) way
either, because its goal is to convert from bytes (as stored on disk or
transferred using other forms of IO) to the internally used charset, e.g. a
UTF-8 to wchar_t conversion, not a conversion between different charsets
directly.

Coming back to win32, this is where codecvt and basic_string actually fail
completely, because they both assume that a single charset element
internally is also always mapped to a single character. However, UTF-16
(win32's wchar_t) actually requires two elements for surrogate sequences
(let's ignore combining characters like accents for now) and UTF-8 requires
up to four elements (theoretically up to 6). IOW, 'string' is not suitable
for UTF-8 and 'wstring' is not suitable for UTF-16!

Anyway, back to the OPs' problem: fact remains that there is no conversion
between CHAR and WCHAR, because the meaning of CHAR is not fixed, you
always need to know the codepage that the CHAR string is encoded with.
Then, you can simply use the MultiByteToWideChar() function (or rather a
suitable wrapper to work with std::[w]string or whatever string class you
are using).

Uli