Re: How to find only one invalid char in src buffer with MultiByteToWi
Bill <Bill@discussions.microsoft.com> wrote:
I am filling char buffer with 0-127 range characters along with e
character, then MultiByteToWideChar API failed. If I include two
times e character, it is getting success. Please find the below code
snippet. Please correct If I am wrong. My system settings are United
States, English. VC++
6.0, Windows XP.
CHAR szData[100] = {0};
strcpy(szData, "1e2345");
INT nWideCharBufferLen = MultiByteToWideChar(CP_UTF8, MB_PRECOMPOSED,
szData, -1, 0, 0 );
// Here issue is there.
I think, MultiByteToWideChar api should return 0. But it is returning
5 always. How?
It returns 0 for me. The documentation states that dwFlgas (second
parameter) must be zero for some encodings, including UTF-8.
When I try with zero for flags, the function returns 6 (the space for
"12345" plus terminating NUL). It skips over the invalid character. One
can pass MB_ERR_INVALID_CHARS for flags to force the function to fail
with ERROR_NO_UNICODE_TRANSLATION error instead.
By specifying CP_UTF8, you claim that your string is UTF-8 encoded when
in fact it is not. In particular, e (U+00E9) should be encoded as two
bytes in UTF-8, C3 A9. The sequence of bytes you are passing to
MultiByteToWideChar (31 E9 32 33 34 35) is not a valid UTF-8 sequence.
Here this api will not be able to convert "e" character why because
it is alreaded encoded.
This statement makes no sense to me, sorry. In what sense is this
character encoded while surrounding characters aren't?
One more thing is, it e char is there twice (for ex: "1eeee2345");,
then it returns ZERO.
In my experiments, it returns 6 when I pass 0 for dwFlags, and zero when
I pass MB_PRECOMPOSED for dwFlags - consistent with the documentation.
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925