Re: convert CharArray to ByteArray

From:

charlesbos73 <cbossens73@yahoo.fr>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 15 Jun 2009 05:46:24 -0700 (PDT)

Message-ID:

<17db339f-f648-4dd5-8698-cf791c0e9d13@q16g2000yqg.googlegroups.com>

On Jun 13, 4:18 am, "Karl Uppiano" <Karl_Uppi...@msn.com> wrote:

"Arne Vajh=F8j" <a...@vajhoej.dk> wrote in message

news:4a32f5d1$0$90270$14726298@news.sunsite.dk...

hierholzer wrote:

I'm converting an array of char to an array of bytes:

static public byte[] convertCharArrayToByteArray(char[] ca) {
  byte[] ba = new byte[ca.length*2];
  int j = 0;
  byte mask = 0xff;

  for(int i = 0; i < ca.length; ++i, j+=2) {
    byte upper8bits = ((byte)(ca[i] >> (1<<3)) & mask);
    byte lower8bits = ((byte) ca[i] & mask);
    ba[j] = upper8bits;
    ba[j+1] = lower8bits;
  }

  return ba;
}

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?

Just cast it with (byte).

Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}

The latter suggestion would definitely be the best approach if the char
array is actual (UTF-16) characters. You might be lucky, and the char arr=

is already from UTF-8 or a single-byte charset, but if not, look out! The
loss of precision warning is telling you something.

This is non-sense.

A Java char is well defined.

UTF-16 is also well defined.

This has exactly *nothing* to do with UTF-8 nor "single byte charset".

There's not going to be any "loss of precision" [sic] when
doing :

(new String(ca)).getBytes("UTF-16");

Any character present in "ca" can be encoded in UTF-16
(including characters from Unicode 3.1 and later)
and the whole resulting byte[] can always be reused to
recreate the original char[]. Whether the original char[]
is correctly formed or not by the OP in case Unicode 3.1 and
up codepoints are used is another topic.

I don't care (perfomances excepted) if internally the
char[] is represented using the color of boots little
fearies are wearing or if it's already UTF-16, the fact
is that:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}

shall *always* produce a byte[] that can be reused to
construct the original char[] (there are exactly zero
issues with UTF-8 or "single byte encoding" [sic] in
this case).

Note that:

System.out.println( convertCharArrayToByteArray( new char[]
{'a'} ).length );

shall print '4' and the OP probably wants to read on
what a BOM is if he decides to use this method.

P.S: Wheter or not the UTF-16 encoding is mandated to be present
for the JVM to be compliant is a question better left to
the JLS-nazi bot that shall recognize himself. Note that
if it is mandatory, then you have to stupidly catch an
exception that is impossible to happen, just like when
you do getBytes("UTF-8") (UTF-8 is mandatory for the JVM
to be compliant, which beg the question as to why we don't
have a getUTF8Bytes() method but I disgress and the JLS-nazi
bot certainly can explain why the Java designer were right
when they mandated UTF-8 to be a supported JVM encoding
but did not provide a getUTF8Bytes() method as everything
in Java is holy and as a logical explanation).