Re: Convert std::string to std::wstring using std::ctype widen()

From:

"Bo Persson" <bop@gmb.dk>

Newsgroups:

microsoft.public.vc.stl

Date:

Sat, 25 Nov 2006 15:02:42 +0100

Message-ID:

<4sr0o1F11834mU1@mid.individual.net>

Jeffrey Walton wrote:

Hi All,

I've done a little homework (I've read responses to similar from
P.J. Plauger and Dietmar Kuehl), and wanted to verify with the
Group. Over in comp.lang.c++, I'm getting a lot of boot Microsoft
answers (which does not help me). Below is what I am performing
(Stroustrup's Appendix D recommendations won't compile in Microsoft
VC++ 6.0).

My question is in reference to MultiByte Character Sets. Will this
code perform as expected? I understand every problem has a simple
and elegant solution that is wrong.

I generally use US English or Unicode, so I don't encounter a lot of
issues others may see (a multibyte character using std::string). I
have verified it works with a Hello World sample.

Before I get flamed for not using std::codecvt, Stroustrup states
(D.4.6 Character Code Conversion, p 925):
The codecvt facet provides conversion between different character
sets when a character is moved between a stream buffer and external
storage...

Jeff
Jeffrey Walton

std::string s = "Hello World";
std::ctype<wchar_t> ct;

This is using the default version of ctype, not necessarily the one in the
current locale. That's why the other codes have a use_facet<>() to retrieve
the current active version.

   std::wstring ws;

   for( std::string::const_iterator it = s.begin();
                                    it != s.end(); it++ )
   {
       ws += ct.widen( *it );
   }

   // Stroustrup again (Section D.4.5, p. 923):
   // A call widen(c) transforms the character c into its
corresponding Ch value.
   // If Ch's character set provides several characters
corresponding to c, the
   // standard specifies that "the implest reasonable
transformation" be used.

   // http://www.research.att.com/~bs/3rd_loc.pdf
   // by Bjourne himself...
   // page 28 of the above reference
   // or
   // The C++ Programming Language, Special Edition
   // Section D.4.2.2, p 895 (Full Manual)
   //
   // const std::locale& loc = s.getloc();
   // wchar_t w = std::use_facet< std::ctype<char> >
   // (loc).widen(c);
   // does not compile in Microsft's VC++ 6.0 environment...
   // getloc() is not a member of std::basic_string< ... > ...

Here s cannot be a string, but a stream. A stream has a locale, a string
does not.

   //
   // wchar_t wc = std::use_facet< std::ctype<wchar_t> >
   // (out.getloc()).widen(*it);
   // does not compile in Microsft's VC++ 6.0 environment...

VC6 has its set of own limitations, especially with templates. Don't use it,
if you don't absolutely have to.

   //
   // Dietmar Kuehl code
   // does not compile in Microsft's VC++ 6.0 environment...
   //
   // std::wstring to_wide_string(std::string const& source) {
   // typedef std::ctype<wchar_t> CT;
   // std::wstring rc;
   // rc.resize(source.size());
   // CT const& ct = std::use_facet<CT>(std::locale());
   // ct.widen(source.data(), source.data() +
   // source.size(), rc.data());
   // return rc;

This gets its ctype from the current default locale. It ought to work.

Except that std::use_facet perhaps doesn't work for VC6, and you might have
to use a macro workaround _USE(loc, facet) instead?

Or upgrade, if you can!

Bo Persson