Re: How to find only one invalid char in src buffer with MultiByte
Hi Igor Tandetnik and Alex Blekhman,
Thank you for your prompt response. I also got same results here.
MB_PRECOMPOSED gives ZERO for all. What are precomposed characters?
I am working on client/server based application and client side. We are
providing UNICODE support for our application as well as backward
compatibility also. Here, I choosed WideCharToMultiByte and
MultiByteToWideChar api's to process data in UTF-8 or ACP. Is it correct?
Requirement: my application recieves data (void*) from server through
sockets. I need to identify whether this data was encoded by UTF-8 or ACP. I
am sure, server side data was encoded by either UTF-8 or ACP.
I tried with below code snippet also. I dint get succes. Any clue on this
how to achieve?
code snippet:
---------------
CHAR szData[256] = {0};
strcpy(szData, "1??2345"); // (?? val 233 or -23 )
//?? is there more than once in szData buffer, it is getting
success.
INT nDataLen = strlen(szData);
INT nDesBufferLen = ::MultiByteToWideChar(CP_UTF8,
MB_ERR_INVALID_CHARS,//0,//MB_ERR_INVALID_CHARS,
szData,
-1,
0,
0);
if (nDesBufferLen == 0) // Here it should return ZERO
{
nDesBufferLen = ::MultiByteToWideChar(CP_ACP,
0,//0,//MB_ERR_INVALID_CHARS,
szData,
-1,
0,
0);
}
--
Thanks & Regards,
Bill.
"Alex Blekhman" wrote:
"Bill" wrote:
I am filling char buffer with 0-127 range characters along
with ?? character, then MultiByteToWideChar API failed. If I
include two times ?? character, it is getting success. Please
find the below code snippet. Please correct If I am wrong. My
system settings are United States, English. VC++ 6.0, Windows
XP.
CHAR szData[100] = {0};
strcpy(szData, "1??2345");
INT nWideCharBufferLen = MultiByteToWideChar(CP_UTF8,
MB_PRECOMPOSED,
szData, -1, 0, 0 );
You're getting unpredictable results because you specified wrong
codepage: CP_UTF8. Your string is not valid UTF-8 sequence. That's
why `MultiByteToWideChar' fails. Garbage in - garbage out. '??'
character (Latin small letter E with acute) has value 0xE9 (or
11101001 in binary). According to UTF-8 format, leading byte with
values E0-EF (11100000-11101111) must be followed by another two
bytes, which has values 0x80-0xBF.
You should specify correct codepage when you call
`MultiByteToWideChar', for example: CP_ACP.
HTH
Alex