Re: Why std::cout stop working when output MBCS string?
Dancefire wrote:
1 locale loc(".936"); // Code page 936
2 locale::global(loc);
3 cout.imbue(loc);
4 string test("\xba\xba\xd7\xd6");// MBCS string "??????" in GBK
encoding.
5 cout << "Before output MBCS string." << endl;
6 cout << test << endl;
7 cout << "After output MBCS String." << endl;
I think you had a misunderstanding here: the locale or rather its codepage
affect the output(!) and not how internal char sequences are interpreted.
IOW, you pass it a string and it converts them to CP 936 above. However, I
don't think there is a way to represent the string you want in an internal
char sequence. The latter is a limitation (or abstraction) of C++
IOStreams, which assume that internal characters always consist of exactly
one element, it doesn't apply to different APIs.
Now, what you should do is that you simply use wchar_t instead of char. IOW,
you use things like std::wcout, std::wstring, std::wfstream etc. Note that
you then use Unicode (UCS2 to be precise) internally.
If I want the line 7 be output, I have to add cout.clear() between
line 6 and line 7.
Yep, conversion fails and thus output fails and the streamstate gets its
fail bit set.
I tried to modify the code to make it work. I found, if I remove the
line 2, the code will work properly. But it doesn't make any sense.
I can't quote chapter&verse, but I think that std::cout fetches its locale
from the global locale on first output operation. However, this still
doesn't make sense, both lines 2 and 3 should then have the same effect on
std::cout.
If I'm wrong, what is the correct procedure for output MBCS to console
in STL way?
Two things here:
1. This has zero to do with the STL! What you mean is the C++ standard
library, the STL doesn't contain any IOStreams or localization.
2. The way how this works might differ from system to system, the strings
passed to locale() are generally not portable. A portable way would be to
write a codecvt facet that converts internal wchar_t to external CP936, but
even that would have to be adjusted slightly for different sizes, encodings
and signednesses of wchar_t.
BTW: What I personally would rather do is use UTF-8 and convert any output
with a dedicated converter.
Uli