Re: Stream, Reader and text vs binary
Russell Wallace wrote:
Suppose one needs to both store (in a file) and transmit (via sockets)
data that will be mostly text, but with the occasional chunk of binary
(e.g. GIF images).
It seems to me that there are three possible ways:
1) Use a Reader (intended for text) and write the binary data directly
as 16 bits to a character.
I assume this _won't_ work, at least not reliably, because various
translations will be done that would mess up the binary data?
If you use a Reader you will need to decide how to encode character data
onto the stream. Beware, much of the Java library is booby trapped. For
instance, if you used java.io.InputStreamReader(InputStream), then you
are leaving the library to make up the character encoding decision for
you. In this case, it uses whatever the machine happens to be set to
use. If you choose, say UTF-8, then every value of char will be preserved.
2) Use a Reader (intended for text) and encode the binary data as text
in hex, base64 or similar. This would work, though I was hoping for a
more elegant solution.
No, not elegant.
3) Use a Stream (intended for binary) and write strings as sequences of
16-bit integers.
Is it safe to do this? That is, if you put a Java String through a
That should work. char is a 16-bit value. UTF-8 would be more conventional.
channel that treats it as a literal sequence of 16-bit integers, are you
guaranteed to get the same character sequence out the other end? Or are
there Unicode complications, bank switching to squeeze different chunks
of the 32-bit code point space into the space of 16 bit Java characters,
that sort of thing that might mean (char)1234 on system A doesn't mean
the same character as (char)1234 on system B?
There are char values that represent surrogate pairs. However, the
Unicode code-points they represent are above 0x10000. So there should be
no loss of information (although not every sequence of octets represent
valid UTF-8).
In general, what's the recommended way to do this - what do people
normally do if they want to put images in an XML file, say? Is there a
fourth way I haven't thought of?
I believe XML either uses out-of-channel binary data (XHTML img, for
instance) or Base64 encoding. You can have a perfectly valid XML
document that is just a Base64 blob between a pair of tags. XML does not
necessarily mean interoperable.
Much better is to use a binary data format, and encode Strings as UTF-8.
You could even cheat and use serialisation, if you don't mind a
Java-only protocol.
Tom Hawtin