Re: Acquiring UTF-8 string length
"Tim Roberts" <timr@probo.com> wrote in message
news:g0i613lssm63bffqa4jgcr4h8p1u0as9pd@4ax.com
"Igor Tandetnik" <itandetnik@mvps.org> wrote:
Well, the question is, again, what do you need this length for. A
length in Unicode codepoints is largely useless.
Igor, with all due respect, I don't understand the attitude you've
shown in this whole thread. What he's asking is perfectly
reasonable. Despite the fact that his "I<heart>NY" string contains
six bytes, if it were printed to a UTF-8 console it would only occupy
four character positions. Why wouldn't I want a way to get that
information?
Consider combining characters. Consider ligatures. Consider zero-width
characters.
Take a look at this:
http://blogs.msdn.com/michkap/archive/2006/02/17/533929.aspx
A string consisting of couple dozen Unicode character can still be
rendered as one glyph, and treated as one glyph for the purposes of,
say, selection and caret movement.
Consider this:
http://www.fileformat.info/info/unicode/char/fdfb/index.htm
A single Unicode character that decomposes into eight characters. I'm
not sure how it behaves with respect to caret movement on systems that
are actually capable of rendering it.
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925