On Wed, 29 Aug 2007 03:47:16 GMT, "Crouchez"
<blah@bllllllahblllbllahblahblahhh.com> wrote, quoted or indirectly
quoted someone who said :
String chinese = "\u4e2d\u5c0f";
System.out.println(chinese.getBytes().length);
Why does this return 2?
I modified your code a little, so it will make the problem clear:
public class Chinese
{
/**
* test harness
*
* @param args not used
*/
public static void main ( String[] args )
{
System.out.println( System.getProperty( "file.encoding" ));
String chinese = "\u4e2d\u5c0f";
byte[] b = chinese.getBytes();
for ( int i=0; i<b.length; i++ )
{
System.out.println( b[i]);
}
// prints
// Cp1252
// 63
// 63
// in other words ??. Those tho chars are not available in your
default encoding.
}
}
I further modified you code to choose the encoding explicitly:
import java.io.UnsupportedEncodingException;
public class Chinese
{
/**
* test harness
*
* @param args not used
*/
public static void main ( String[] args ) throws
UnsupportedEncodingException
{
System.out.println( System.getProperty( "file.encoding" ));
String chinese = "\u4e2d\u5c0f";
// explicit choice of encoding, designed to support Chinese.
byte[] b = chinese.getBytes( "Big5-HKSCS" );
for ( int i=0; i<b.length; i++ )
{
System.out.println( 0xff & b[i]);
}
// prints
// Cp1252
// 164
// 164
// 164
// 112 more like you would expect.
}
}
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com