Re: UTF-8 messages in exceptions ?

From:

Timothy Madden <terminatorul@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Wed, 21 Jul 2010 18:46:28 CST

Message-ID:

<4c4711ad$0$272$14726298@news.sunsite.dk>

Goran wrote:

On Jul 18, 12:28 am, Timothy Madden <terminato...@gmail.com> wrote:

Hello

I need to write some wrapper classes around a library that my client has,
and the error messages (and all the other strings in the library) are in
UTF-8. Can I somehow create an exception class derived from std::exception
(std::runtime_error) that could carry such messages ?

I mean the message returned std::exception::what() is assumed to be in the
application locale, and I can not just set the application locale to UTF-8.

If standard library and other librarries you use aren't localized,
then they are most likely in English, and that's OK for plain UTF-8.
So when you output what() to something UTF-8 aware, it's OK.

If they are localized, and are using specific locale (not UTF-8),
whoops! How about some simple mix-in derivation, e.g.:

class utf8_error
{
   virtual const char* what_utf8() const = 0;
}

then,

class my_error : public runtime_error, public utf8_error
{
   // Implement what and what_utf8
};

and finally, in you catch handlers, use:

string utf8_ed_what(const exception& e)
{
   const utf8_error* utf8 = dynamic_cast<const utf8_error*>(&e);
   if (utf8)
     return utf8->what_utf8();
   else
     return locale_text_to_utf8(e.what());
}

BTW, application locale is assumed? How? (Honest question).

Yes, maybe this would currently be the only practical work-around to
this rather theoretical problem.

The thing is that I, like other programmers, am not too found of
dynamic_cast and run-time type identification.

So what I did was to just put the UTF-8 string in the std::exception,
and have my error reporting function, invoked from catch(), always
decode the string as UTF-8. Essentially I am just hoping that the
standard library and other libraries use only 7-bit ASCII what()
messages in exceptions, which are compatible with UTF-8.

About assuming the application locale for what() strings, the idea is
the string would be human-readable, so it would be possible to output it
to stdout, which implies the string would have the charset from the
current locale.

However what the standard says (18.6.1.8) is:

    virtual const char* what() const throw();

    Returns: An implementation-defined NTBS.
    Notes: The message may be a null-terminated multibyte string
(17.3.2.1.3.2), suitable for conversion and display as a wstring (21.2,
22.2.1.5).

Where NTBS stands for null-terminated byte string. The last reference
(22.2.1.5) is for codecvt<internT,externT,stateT> class template, and
the only codecvt<> instantiation required by the standard, that performs
a conversion, "convert(s) the implementation-defined native character
set" between wchar_t and char.

I am unsure what the "native character set" would be, but I guess the
current locale would match it.

Thank you,
Timothy Madden
--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]