Re: Unicode chinese
On Wed, 29 Aug 2007 16:22:45 GMT, "Crouchez"
<blah@bllllllahblllbllahblahblahhh.com> wrote, quoted or indirectly
quoted someone who said :
b.length = 6. But why 6 when I thought chinese characters take up 2 bytes
per character?
I suspect your parents punished you for curiosity as a toddler.
EXPERIMENT!
import java.io.UnsupportedEncodingException;
public class Chinese
{
/**
* test harness
*
* @param args not used
*/
public static void main ( String[] args ) throws
UnsupportedEncodingException
{
System.out.println( System.getProperty( "file.encoding" ));
String chinese = "\u4e2d\u5c0f";
// explicit choice of encoding, UTF-8 supports everything
including Chinese.
byte[] b = chinese.getBytes( "UTF-8" );
for ( int i=0; i<b.length; i++ )
{
System.out.println( Integer.toHexString( 0xff & b[i] ));
}
// prints
// Cp1252
// e4
// b8
// ad
// e5
// b0
// 8f
// why those chars?
// BOM is ef bb bf, so that is not it.
// see http://mindprod.com/jgloss/utf.html#UTF8ENCODER
// codes >= 0x800 take 3 bytes to encode.
}
}
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com