Re: Want Input boxes to accept unicode strings on Standard Window
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message
news:Xns99784934397AMihaiN@207.46.248.16...
<?xml encoding="<insert encoding">
If ther encoding is not specified, then the encoding is assumed to be
utf-8
(this is what the standard says)
Ah, UTF-8. I know you discussed this at length several months ago here, but
to be honest, this is my understanding of it: it is an 8-bit encoding
scheme no different than Ansi (that's how it fits in 8 bits). Since it is
8-bits, it cannot specify everything a LPWSTR can. Yet it is somehow is
supposed to be better than Ansi, not reliant on any codepage. But if it's
only 8 bits, how is that?
And UTF-8 begs the question about UTF-16. Is UTF-16 the same as what
Windows Notepad (in the Save As dialog) calls "Unicode"? Or is Windows
concept of Unicode and LPWSTR different than UTF-16?
I still don't know
if saving the XML file in Unicode (with the 0xFFEE BOM) causes the text
to
be displayed correctly regardless of the "encoding" attribute.
That would be wrong according to the standard. And it is not supported by
IE.
OK, thanks. From what you, Tom, and David W. say, UTF-8 is the way to go
when producing XML files.
The standard (http://www.w3.org/TR/2006/REC-xml11-20060816/#charencoding)
"All XML processors MUST be able to read entities in both the UTF-8 and
UTF-
16 encodings."
So and XML parser without UTF-16 and UTF-8 is not an XML parser, is a
hack.
OK, now I'm confused. How can you save a UTF-16 file (which you say is
standard) without the 0xFFEE BOM (which you say is not standard)?
Do you know if MSXML or some of the "big boys"
or FirstObject parsers read Unicode files?
All of them do, if they claim to be XML parsers. If not, they are toys.
MSXML, Xerces, Expat, all handle UTF-8, UTF-16, and support encoding.
Well, I guess I would just say that XML wouldn't be the standard that it is
if it required these kinds of XML parsers to be universal. These types of
parsers have severe redist issues (some are 5 MB big) or calling conventions
(e.g. MSXML uses COM) that prevented them from being attractive alternatives
for us.
Our parser does not have all this support, but the art is in finding one
that holds true to our KISS (keep it simple, stupid) goals yet still
preserves Asian languages. I would hope a parser need not be 5 MB large to
support Asian languages.
Yes, and I'm not happy with that, but our scheme seems to have been
acceptable so far. Perhaps the results aren't so great, it's just that
the
poor people affected by this are so used to it, they don't complain.
Or they stopped buying your product and moved to something better.
When we released our product, it was an Ansi product because Win9x neded to
be supported (and we couldn't redist the MS Unicode for Win9x, which we'd
heard had problems anyway). Since it was Ansi, it used the Ansi codepage.
And therefore we didn't care if our XML files were Ansi either.
Someone else ported the product to Unicode, but apparently is still refining
the XML part. I got bit when I returned to this product and stumbled on
these XML issues.
This product has deployed millions of copies worldwide, and if it is
possible, is even more conscious about localization and global acceptance
than your current company.
-- David