Re: Read utf-8 file return utf-16 coding hex string ?
On Jan 29, 3:59 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:
moonhkt wrote:
Hi All
Why using utf-8, the hex value return 51cc and 6668 ?
od -cx utf8_file01.text
22e5 878c e699 a822 =
with " befor and after
I don't understand the above. Are you trying to suggest that the te=
xt
'with " befor and after' is part of the output of the "od" program? =
If
so, why does it not appear to match up with the binary values written
out? And if the characters you're concerned with are at index 101 a=
nd
102, why only eight bytes in the file? And if the file is UTF-8, wh=
y
are you dumping its contents as shorts? Why not just bytes?
Frankly, the whole question doesn't make much sense to me. That sai=
d,
the basic answer to your question is, I believe: UTF-8 and UTF-16 are
different, so of course the bytes used to represent a character in a
UTF-8 file are going to look different from the bytes used to represent
the same character in a UTF-16 data structure.
Pete
System : AIX 5.3
Text file just have two utf-8 chinease character.
cat out_utf.text
=E5=87=8C=E6=99=A8
od -cx out_utf.text
0000000 207 214 231 \n
e587 8ce6 99a8 0a00
0000007
java to build utf-8 data, input using utf-16 value. I does not know
how to input utf-8 hex value.
My Question is input utf-16 hex value, when write to file with UTF8
codepage, the data will encode to UTF-8 ?
Do you know hwo to input hex value of utf-8 ? I tried \0xe5 not works.
import java.io.*;
public class build_utf01 {
public static void main(String[] args)
throws UnsupportedEncodingException {
// I want console output in UTF-8
PrintStream sysout = new PrintStream(System.out, true, "UTF-8");
try {
File oFile = new File("out_utf.text");
BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream(oFile),"UTF8"));
/* http://www.fileformat.info/info/unicode/char/51cc/index.htm
UTF-8 (hex) 0xe5 0x87 0x8c (e5878c)
UTF-16 (hex) 0x51CC (51cc)
http://www.fileformat.info/info/unicode/char/6668/index.htm
UTF-16 (hex) U+6668
UTF-8 (hex) 0xe6 0x99 0xa8 (e699a8)
*/
String a = "\u51cc\u6668" ;
int n = a.length();
sysout.println("GIVEN STRING IS=" + a);
sysout.printf("Length of string is %d%n", n);
sysout.printf("CodePoints in string is %d%n", a.codePointCount
(0,n));
for (int i = 0; i < n; i++) {
sysout.printf("Character[%d] is %s%n", i, a.charAt(i));
out.write(a.charAt(i));
}
out.newLine();
out.close() ;
} catch (IOException e) {
}
}
}
Output utf-8 enabled terminal
java build_utf01
GIVEN STRING IS==E5=87=8C=E6=99=A8
Length of string is 2
CodePoints in string is 2
Character[0] is =E5=87=8C
Character[1] is =E6=99=A8