Re: Want Input boxes to accept unicode strings on Standard Window

From:

"Tom Serface" <tom.nospam@camaswood.com>

Newsgroups:

microsoft.public.vc.mfc

Date:

Tue, 24 Jul 2007 22:47:24 -0700

Message-ID:

<483B17F5-5E60-492B-9F9D-0079039C6F10@microsoft.com>

"David Ching" <dc@remove-this.dcsoft.com> wrote in message
news:xJxpi.27927$2v1.1892@newssvr14.news.prodigy.net...

But wouldn't the MFC libraries still display in English even in the
UNICODE build? Building in UNICODE doesn't fix that.... Actually, we've
statically linked to the MFC English version for years, and have never had
an issue (at least none have been reported), probably because no MFC UI is
normally displayed.

Yes, you're right about that. That happens based on the windows
installation so far as I can tell.

By this do you mean by setting the Regional Control Panel or
SetThreadLocale() appropriately? I did some tests the other day and saw
that MultiByteToWideChar(CP_ANSI, ...) converted a MBCS string to Unicode
differently based on the Regional Control Panel setting. My take was that
setting the Regional Control Panel altered CP_ANSI. I presume
SetThreadLocale() does the same thing, albeit only for the calling thread
and not on a system global basis.

The problem, for me, has been that I don't know what language will
eventually be used. We even tried embedding the code page number in our
text file, but still had problems reading some files under different code
pages. It was a lot less hassle with Unicode.

Yes, these were all very well known (and grudgingly accepted) problems in
the Win9x world where Unicode was not very well supported.

Yeah, but we're caring less about that all the time ;o)

For XML, even if you have an Ansi (non-Unicode) XML file, if the first
line has at least

<?xml encoding="<insert encoding">

then IE displays the XML file correctly. (IE has become our default XML
viewer.) So the "encoding" attribute means a lot here. I still don't
know if saving the XML file in Unicode (with the 0xFFEE BOM) causes the
text to be displayed correctly regardless of the "encoding" attribute.
Our little XML parser does not read Unicode XML files, nor does it honor
the "encoding" attribute. Therefore, even though it returns LPWSTR
strings, they have been converted to Unicode strings based on the CP_ANSI
codepage, and that (seems to) require the Regional Control Panel to be set
to the language that was used to create the XML file. Do you know if
MSXML or some of the "big boys" or FirstObject parsers read Unicode files?

I think CMarkUp handles Unicode, but I haven't tried MSXML. I know Xerces
handles it as well. I tried using the encoding= thing, but it has the same
problems with using one file saved in one language in another. I could be
wrong, but I found the whole thing of balancing code pages more trouble than
I thought it was worth.

UNICODE builds make it easier to display Asian text, but our problem is
how to construct reliable LPWSTR from things like XML files.

I use the Xerces parser and I've never had a problem with reading or saving
Unicode files. Actually I store my XML in UTF-8 to compact them a bit.
Seems to work OK.

In some cases it was not straightforward to port from Ansi to Unicode due
to the fact that code relies on single-byte character strings to perform
their functions. Things from driver-land which wouldn't know what to do
with a UNICODE string if we could even train device driver writers about
UNICODE! ;)

Can't argue with that and I understand your point. Of course, if you are
relying on single byte strings you're going to have trouble with MBCS in
Asian languages as well :o)

Tom