Re: wtf is happening here @ bitwise comparison
On Dec 23, 9:03 am, Paavo Helde <myfirstn...@osa.pri.ee> wrote:
tschmittldk <tschmitt...@googlemail.com> wrote in news:9139bceb-5be4-
4653-8b82-d92663dd1...@l32g2000yqc.googlegroups.com:
On 22 Dez., 19:45, tschmittldk <tschmitt...@googlemail.com> wrote:
Okay thanks for all your answers. I try it tomorrow and
post the code then (I left my notebook in my student
flat...). But it seems more clearly to me now, thanks!
Okay, now here's the code:
void codevert(char *ArrayToTransform)
{
int j = 0;
char *ptr = ArrayToTransform;
while (*ptr != '\0') {
if((*ptr & 0xC0) > 0xbf)
{
if(*ptr == '\xc3')
simplifier_correct(3, ptr++);
else if(*ptr == '\xc4')
simplifier_correct(3, ptr++);
else if(*ptr == '\xc4')
simplifier_correct(3, ptr++);
else
std::cout << "E01";
}
ptr++;
}
}
This is all very brittle.
Yes, but not for the reasons you imply. It's brittle because
it only handles a very small subset of UTF-8. But presumably,
the poster knows that, and accepts that any but a few specific
two byte sequences will result in "E01". Not to mention the
typo: the last two else if test exactly the same thing.
There's nothing brittle about it at the C++ level.
*ptr is char, which is most probably a signed
type and can be negative.
And is probably 8 bits.
(*ptr & 0xC0) is int and appears to be positive
Not only appears to be: is.
The intermediate values will be unexpected, of course, but the
final result should be correct. (The expression *ptr might be
negative.)
and of the desired value even if *ptr is negative, this is
more by chance and not very portable.
Could you name an architecture where it wouldn't work? And
explain why, and what you'd get. (There is, perhaps, a brittle
part in filling the char[]. Formally, at least, it's possible
that the iostream library reject any negative char's. In
practice, a compiler whose iostream library didn't support this
kind of thing won't be used, so you don't have to worry about it.)
0xbf is int and positive, '\xc3' is char and
negative.
And? In all cases, integral promotion occurs. And when the &
is present, it ensures that the results must be positive.
I would rewrite this code about like this:
const unsigned char *ptr = reinterpret_cast<unsigned char*>
(ArrayToTransform);
while (*ptr) {
if((*ptr & 0xC0) > 0xbf)
{
if(*ptr == 0xc3)
// ...
Why bother?
Actually, I'd rewrite the code more fundamentally, to make it
clear what is actually being tested; if nothing else >= 0xC0,
rather than > 0xBF, but more likely with a switch on the results
of *ptr & 0xC0 (with four cases clearly delimiting the
possibilities).
--
James Kanze