Re: isspace
On 3 Feb, 09:56, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
On Fri, 2010-01-29, Paavo Helde wrote:
...
Or I could keep the text in UTF-8 and use my own custom function for
checking for the whitespace, checking directly for all Unicode whitespa=
ce
characters as listed inhttp://en.wikipedia.org/wiki/Whitespace_%
28computer_science%29, this seems to me much less error-prone than
worrying if Russian locale and std::isspace are working correctly on al=
l
platforms.
Worrying? "I don't support doing analysis of Russian text on a
platform with broken Russian locales" sounds like something you can
happily say.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Ok, to summarize things learned so far:
UTF-8 can be handled by simply using std::string (henche char)
UTF-16 and UTF-32 handled by std::wstring and std::wchar_t but not
reliable because the type size is implementation-specific
Now, something like:
std::ifstream is;
const std::locale filelocale("Russian_Russia.1251");
is.imbue(filelocale);
is.open(argv[1]);
std::string s;
while(std::getline(is, s))
{
for (std::string::const_iterator it = s.begin(); it != s.end(); +
+it)
{
std::cout << *it;
if (std::isspace(*it, filelocale)) std::cout << "space found!"
<< std::endl;
}
std::cout << std::endl;
}
Works if we give as input a Russian text (althought the cout isn't
able to correctly display the russian characters).
If we are under Linux, something like
try
{
const std::locale filelocale("Russian_Russia.1251");
}
catch
{
try
{
const std::locale filelocale("ru_utf8");
}
catch
{
throw();
}
}
Can work? Any suggestion (I don't even know the specif exception that
have to be catch. Just experimenting...
Thanks, Mattia