Re: stdin charset
On Apr 30, 9:27 am, Antimon <anti...@gmail.com> wrote:
I'm new to c/c++ and working on string stuff with visual studio 2005.
NB. I'm not expert on this, but am posting because nobody else
has yet, so perhaps I can help you a little, at least.
I'm trying to understand something, for example when i do this:
wstring st;
wcin >> st;
if the input is pure ascii, then everything is ok, but if there are
unicode characters like "=C5=9F" (u+015f) what is the encoding of st now?
It depends on your compiler. From what I know of Microsoft, it's
likely to be UTF-16.
Everything works when i use this st string, do stuff, write to cout
etc but if i want to convert this string to utf-8, what encoding am i
converting from?
C++ includes the C functions for converting between "wide
character" and "multi-byte character sequence". It doesn't
specify that MBCS has to be UTF-8, but if you're lucky then
it will turn out to be that on your compiler. Try using the
function wcstombs() on your wstring and it might spit out
UTF-8 if you're lucky.
Btw, when i do something like this:
wsring a = L"=C5=9F";
wstring b;
wcin >> b;
and write "=C5=9F" into console,
(a == b) is false. i checked a and it's unicode (16), b is not
unicode, i could not manage to find what it is.
You can check what you have got by printing it out as a series
of unsigned chars, e.g. :
void hex_dump( void const *ptr, size_t nbytes )
{
unsigned char const *p = ptr;
while (nbytes--)
printf("%02X", *p++);
putchar('\n');
}
and then call it like this:
hex_dump( a.c_str(), a.size() * sizeof(wchar_t) );
hex_dump( b.c_str(), b.size() * sizeof(wchar_t) );