Re: Feeding string into ostringstream only uses up to the first null?

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Sat, 31 May 2008 11:56:17 +0200

Message-ID:

<FZqdnaAd7NxevdzVnZ2dnUVZ_uLinZ2d@posted.comnet>

* James Kanze:

On May 31, 12:38 am, "Alf P. Steinbach" <al...@start.no> wrote:

* Christopher:

On May 30, 10:18 am, coomberjo...@gmail.com wrote:

On May 30, 4:47 am, James Kanze <james.ka...@gmail.com> wrote:

On May 29, 11:36 pm, coomberjo...@gmail.com wrote:

I have a few std::strings that I am using to store raw binary
data, each of which may very well include null bytes at any
point or points.

As others have pointed out, that's probably a design error.
However...

I guess I don't understand why. Strings are designed to be able to
handle binary data, including nulls.

According to who?

You make an interesting point, in a certain sense[1].

Very much. I've yet to really figure out what std::string was
designed for: it doesn't really have much support for text
(despite its name),

std::string has one great feature, that it is a /standard/ carrier for short
text strings.

and as a more general data container, I
can't imagine a case where std::vector wouldn't be superior.

True.

std::string has no notion of valid character encodings, it doesn't interpret
contents (except in the null-op sense of the conversions to/from C-string).

But this point of view is opposite of what it seems Christopher attempted to
communicate.

(I've been playing around with UTF-8 a lot lately, and I've
found that although the interface uses std::string, internally,
std::vector< Byte >, where Byte is a typedef for unsigned char,
works a lot better, most of the time.)

Of course, if you're talking more generally, the word "string"
is usually associated with text, and I wouldn't normally expect
a string to be able to handle [arbitrary] binary data (although it
should be able to contain any character data, including that which
contains a '\0' character).

Well, most string classes I've encountered don't interpret contents, except auto
switching between wide and narrow with some hardcoded encoding choices, leaving
the interpretation of string element values as character encoding points to the
app. But a string class that did interpret contents could be very useful. So
currently the general expectation for me is that a "string" indeed can handle
arbitrary binary data, but that may and probably will change, as we get smarter,
more text-oriented string classes both in C++ and in other languages.

The reason seems to be historical, that 30 years ago or so one could relatively
freely assume that for internal string handling, a character was encoded in some
fixed-size unit, typically a byte.

Now we have progressed to variable length encoding, both for network comms,
files and even in the Windows API, but our programming infra-structure, the
support in standard libraries, has not kept up with the applications; I think
when it does, as it seems bound to, the default expectation will change. :-)

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?