Re: Confusing result from std::string::length()
On Jun 4, 6:43 am, bork <koznobik...@gmail.com> wrote:
{ a broken character '?' substituted manually -mod }
Bo M?ller:
[snip]
I have stumbled upon a problem getting the length of the original
string.
As long as the string contains ASCII chars there is no problems. When
special chars are present in the string I get the wrong length. It seems
'?' is two chars and not one.
I need an explanation of this and some pointers on how to solve it.
[snip]
To solve your problem you probably need 3rd party tool like ICU
library (http://site.icu-project.org/).
Quick note:
A few weeks ago, I did a semi-thorough review of ICU. I told myself
before doing the review:
"This is one instance were you absolutely positively ~will~ use the
existing code and not try to do your own! Wrap it if necessary, but do
not try to change it!"
Then I saw that ICU was:
1. using int to represent inherently unsigned quantities. "No big
deal. Stylistic issue easily accomodated."
2. using strange modeol for synchronization, testing if state is
mutable, etc. "Ok..maybe I can work around that."
3. using type UBOOL as bool, being a typedef for int8_t, which causes
problems with my type system. "Hmm.."
4. No use of exceptions. "Wow!, red flag but...still might be OK..."
Then I saw this:
3. UBOOL UnicodeString::IsBogus()
http://icu-project.org/apiref/icu4c/classUnicodeString.html
And thus was the straw that broke the camel's back.
Because exceptions were not used, code sequences that result in
anomalous state cannot be dealt with using throw, and all that can be
done is put the string into a bogus state, so it is possible to have
"bogus" strings extant throughout the system. The programmer needs to
invoke IsBogus() on each string to test whether it is bogus, which
implies that he must remain cognizant of all the various situations
where "bogusness" might occur. If IsBogus() returns TRUE, the object
is useless, even though it still exists, which was too much for me to
bear, and IMO, an anathema of OOP.
I do not accept the excuse that the library got it start during a
period when exceptions where not available on all supported compilers.
At some point, it is better to just focus on the code and get it
(mostly) right, and if a compiler is still so poor that it cannot
support something as fundamental as exceptions, forsake that compiler.
I started anew with my own UNICODE string class.
-Le Chaud Lapin-
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]