Re: Best way to handle UTF-8 in C++
On 5/6/2010 1:39 PM, Peter Olcott wrote:
"Victor Bazarov"<v.bazarov@comcast.invalid> wrote in
message news:hruqhc$lo6$1@news.eternal-september.org...
On 5/6/2010 10:11 AM, Peter Olcott wrote:
"Victor Bazarov"<v.bazarov@comcast.invalid> wrote in
message news:hruhqu$hqt$1@news.eternal-september.org...
On 5/6/2010 9:45 AM, Peter Olcott wrote:
I am looking for a way to handle UTF-8 text in my C++
application. The ideal case would be an STL class that
handles UTF-8. What is the next best thing?
What do you mean by "handle"? STL class? Don't you
have
the compiler documentation? If there is one, you
already
have all information you need. Want more? Buy a book
on
the Standard library. There are several that many
consider decent. Next best thing? Google.
I must be able to use UTF-8 strings in my C++
application. I
want to know the best way to do this. I prefer an
interface
that works the same way as the STL interface.
What do you mean by "use" and in what way can't you "use"
the UTF-8 strings already? There is no such thing as "STL
interface", perhaps you can explain what you mean by "the
same way". I can start guessing, but it's much better if
you just specify what exactly you're trying to accomplish.
Try to refrain from using such generic terms as "STL
interface" or "use". For example, you can say, "I need to
be able to figure out whether there are uppercase
characters in my 'string', like the standard function
'isupper' does"...
I want a string class that works exactly the same way as
std::string, except implements UTF-8.
....as opposed to *what*? UTF-8 is an encoding scheme. 'std::string'
does *not* have an encoding scheme, it's a mere container of 'char'.
Nothing more, nothing less. What *exactly* in it doesn't work NOW for
you? Have you tried making the default 'char' unsigned? If your
platform has 8-bit chars, and you make them unsigned, you got yourself
UTF-8 storage type. And 'std::string' will provide functionality for
storing elements of that type (by virtue of being defined as
'std::basic_string<char>'), and operations to manipulate that storage
(append to, erase from, enumerate, etc.)
So, once again, what do you mean by "implements UTF-8"?
> This means that the
interface can remain the same, (all of the member functions
have the same name and same parameters) but the underlying
meaning may be different.
"May be different"? If I rewrite 'std::string' for you and just make
all functions return 0 and do nothing, would that be acceptable? That's
a rhetorical question, BTW. If you just allow the "meaning" to be
different, you still haven't specified anything. Does it *have to* be
different? In what way?
Could it be that you're don't know yet what you *need* from your class,
which you hope will "handle" UTF-8? What *operations* do you hope it
will help you perform on your "UTF-8" strings?
V
--
I do not respond to top-posted replies, please don't ask