Re: reading filenames from stdin - with umlauts?
Dan Stromberg <dstromberglists@gmail.com> writes:
Is the java String type -always- 16 bits per character?
Yes (if we ignore surrogate pairs, which are rare and not
used for umlauts).
That is, if I try to stick an 8 bit value into a String, is it
always going to be converted to a different encoding that maps
back most of the time, but not always?
The Reader objects already take care to convert between
raw bytes and characters. Strings contain characters,
stricly speaking, they have no ?encoding?. They might
be converted to/from byte[] or streams to en- or decode them.
Do java strings of any sort have an associated but variable encoding?
No. Ignoring surrogate pairs, a string is a sequence of
characters; the value of each character /always/ is the
corresponding Unicode code point.
Are there different string types that have different encodings?
No (for the strings of the standard class ?java.lang.String?).
Is there any way of opening a filename that isn't stored in a String?
Not with the standard classes AFAIK.
~~
To debug, try this:
$mkdir d0
$touch d0/?
$find d0 -name ? -print | od -h
0000000 6430 2fe4 0a00
0000005
If the filesystem uses ISO 8859-1, you should see ?e4? as above
(?64302fe4? is ?d0/??).
Then, read the output of this find from Java and debug print
it from Java to a sequence of hex codes.
If it is ?6430sfe4?, then you have read it correctly (ISO
8859-1 code points agree with Unicode code points here).
Otherwise, you might post here what it is instead.
You can also bypass the Reader class, read the ?raw bytes?
from the stream, and use their hex dump to get an idea of the
apparent encoding of the stream (post the hexdump here).