Re: UTF-8 messages in exceptions ?
Goran wrote:
On Jul 18, 12:28 am, Timothy Madden <terminato...@gmail.com> wrote:
Hello
I need to write some wrapper classes around a library that my client has,
and the error messages (and all the other strings in the library) are in
UTF-8. Can I somehow create an exception class derived from std::exception
(std::runtime_error) that could carry such messages ?
I mean the message returned std::exception::what() is assumed to be in the
application locale, and I can not just set the application locale to UTF-8.
If standard library and other librarries you use aren't localized,
then they are most likely in English, and that's OK for plain UTF-8.
So when you output what() to something UTF-8 aware, it's OK.
If they are localized, and are using specific locale (not UTF-8),
whoops! How about some simple mix-in derivation, e.g.:
class utf8_error
{
virtual const char* what_utf8() const = 0;
}
then,
class my_error : public runtime_error, public utf8_error
{
// Implement what and what_utf8
};
and finally, in you catch handlers, use:
string utf8_ed_what(const exception& e)
{
const utf8_error* utf8 = dynamic_cast<const utf8_error*>(&e);
if (utf8)
return utf8->what_utf8();
else
return locale_text_to_utf8(e.what());
}
BTW, application locale is assumed? How? (Honest question).
Yes, maybe this would currently be the only practical work-around to
this rather theoretical problem.
The thing is that I, like other programmers, am not too found of
dynamic_cast and run-time type identification.
So what I did was to just put the UTF-8 string in the std::exception,
and have my error reporting function, invoked from catch(), always
decode the string as UTF-8. Essentially I am just hoping that the
standard library and other libraries use only 7-bit ASCII what()
messages in exceptions, which are compatible with UTF-8.
About assuming the application locale for what() strings, the idea is
the string would be human-readable, so it would be possible to output it
to stdout, which implies the string would have the charset from the
current locale.
However what the standard says (18.6.1.8) is:
virtual const char* what() const throw();
Returns: An implementation-defined NTBS.
Notes: The message may be a null-terminated multibyte string
(17.3.2.1.3.2), suitable for conversion and display as a wstring (21.2,
22.2.1.5).
Where NTBS stands for null-terminated byte string. The last reference
(22.2.1.5) is for codecvt<internT,externT,stateT> class template, and
the only codecvt<> instantiation required by the standard, that performs
a conversion, "convert(s) the implementation-defined native character
set" between wchar_t and char.
I am unsure what the "native character set" would be, but I guess the
current locale would match it.
Thank you,
Timothy Madden
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]