Re: Writing unsigned char to std::ostream
Ulrich Eckhardt wrote:
My personal gut feeling is that Comeau and g++ are both correct and MSC is
jumping through some loops for user-convenience.
BTW:
unsigned char const utf8_bom[] = { 0xef, 0xbb, 0xbf, 0};
ostrm << utf8_bom << header << std::endl;
Surprisingly, there are in fact overloads for signed and unsigned char in
iostreams.
I think the trouble with your suggestions (and my original code) is that
ostream::put() takes a char argument, and conversion from unsigned char
to char is undefined behavior.
Wasn't that implementation-defined? Implementation-defined is something I'll
live with, but undefined is something I'd rather avoid...
My immediate instinct was to change the "function-style cast" to a
"C-style cast".
Never use C-style casts in C++, they only serve to hide broken code and move
the error detection from compile-time to runtime.
This certainly compiles, but may not run as intended on
some systems (in VC it works I think).
Well, that's what unittests are for.
Hi Ulrich:
Thanks for the reply.
Yes, sorry, I meant that conversion of unsigned char to char is
implementation defined (not undefined). One of the most stupid and
dangerous features in the language, IMHO.
I think there are really two issues here:
1. The use of "constructor syntax" with the unsigned char type.
2. How does ostream deal with unsigned char?
I actually got into this due to Issue 1 (which is easily worked around).
But Issue 2 is much more troubling.
Issue 1
-------
On Comeau, if I write
char c = char(1);
it compiles. But if I write
unsigned char uc = unsigned char(1); // 1
it does not. The same thing happens for other "compound types" like long
int. However, if I do
typedef unsigned char unsigned_char;
unsigned_char uc = unsigned_char(1); // 2
then it works. VC allows both 1 and 2, which seems correct to me.
Issue 2
-------
ostream::put() can only take a char as argument, so using it to output
an unsigned char is undefined behavior. This is why I switched to
operator << (which is overloaded for unsigned char). But what does it do?
In the VC source (in <ostream>) it just casts the unsigned char to char
(using a C-style cast), and then uses ostream::put() This works (for me)
on VC because 0xFF gives -1 when converted to char. But on some systems
(apparently) 0xFF converts to 127, which seems just crazy to me.
Getting back to my original problem, it seems that I should use your idea
unsigned char const utf8_bom[] = { 0xef, 0xbb, 0xbf, 0};
ostrm << utf8_bom << header << std::endl;
which works because the ostream implementation just casts the pointer to
const char*, not the bits of the individual characters. Thanks for this.
It seems that the ostream overload should be defined as
ostream& ostream::operator << (ostream& ostrm, unsigned char uc)
{
const char* p = reinterpret_cast<const char*>(&uc);
return ostrm.put(*p);
}
but it is not.
--
David Wilkinson
Visual C++ MVP