Re: change ISO8859-1 to GB2312
On 5=E6=9C=8820=E6=97=A5, =E4=B8=8A=E5=8D=8812=E6=97=B650=E5=88=86, Lew <no=
....@lewscanon.com> wrote:
On 05/19/2010 02:40 AM, moonhkt wrote:
Our database codepage is iso8859-1. Some data input with GB2312 data.
When export data to iso8859-1 format with GB2312 data, Is it possible
to change iso8859-1 to GB2312 format ?
Machine AIX.
I try below coding not work.
import java.nio.charset.Charset ;
import java.io.*;
import java.lang.String;
public class read_iso {
You should follow the Java naming conventions.
public static void main(String[] args) {
File aFile = new File("abc.txt");
try {
... and indentation conventions.
String str = "";
And not initialize to values that are never used, only discarded.
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputSt=
ream(aFile),
"iso8859-1"));
while (( str = in.readLine()) != null )
{
System.out.println(str);
System.out.println(new String (str.getBytes=
("iso8859-1")));
Didn't you say the data was input in GB2312 encoding?
Whatever, this constructs a string using the platform native encoding fro=
m
bytes encoded using ISO-8859-1. If that isn't the native encoding, =
you got
worries.
System.out.println(new String
(str.getBytes("iso-8859-1"),"GB2312")); /* not */
Now you're decoding bytes using GB2312 from bytes encoded using ISO-8859-=
1.
That can't work.
System.out always uses the platform default string encoding.
}
} catch (UnsupportedEncodingException e) {
} catch (IOException e) {
}
Don't silently eat exceptions.
}
}
My approach to the encoding would be a lot more straightforward. No=
ne of this
wacky "new String()" stuff.
<sscce source="eegee/FooCoder.java">
package eegee;
import java.io.*;
import org.apache.log4j.Logger;
import static org.apache.log4j.Logger.getLogger;
public class FooCoder
{
private transient final Logger logger = getLogger( FooCod=
er.class );
public static void main( String[] args )
{
new FooCoder().recode();
}
public void recode()
{
final BufferedReader rin;
final BufferedWriter owt;
try
{
rin = new BufferedReader( new InputStreamRea=
der(
getClass().getResourceAsStream( "temp.t=
xt" ),
"ISO-8859-1" ));
owt = new BufferedWriter( new OutputStreamWr=
iter(
System.out, "GB2312" ));
}
catch ( IOException exc )
{
logger.error( exc );
return;
}
try
{
for ( String str; (str = rin.readLine()) !=
= null; )
{
owt.write( str );
owt.newLine();
}
owt.flush();
}
catch ( IOException exc )
{
logger.error( exc );
}
finally
{
try
{
rin.close();
owt.close();
}
catch ( IOException exc )
{
logger.error( exc );
}
}
}}
</sscce>
--
Lew
Hi Lew
Thank a lot.
How to check platform native encoding ?
Change your code as below. My test file can conv to UTF-8, view in
Reflection UTF-8 Emulation, the font is ok.
View in IE the font is ok.
temp.txt file
| 10 TEST1 |=E6=B5=8B=E8=AF=951
| |
| 11 TEST2 |=E6=B5=8B=E8=AF=952
| |
| 12 TEST3 |=E6=B5=8B=E8=AF=953
| |
| 13 TEST4 |=E6=B5=8B=E8=AF=954
| |
| 14 TEST5 |=E6=B5=8B=E8=AF=955
| |
import java.io.*;
public class conv_ig
{
public static void main( String[] args )
{
new conv_ig().recode();
}
public void recode()
{
final BufferedReader rin;
final BufferedWriter owt;
try
{
rin = new BufferedReader( new InputStreamReader(
/* getClass().getResourceAsStream( "temp.txt" ),
"ISO-8859-1" ));
owt = new BufferedWriter( new OutputStreamWriter(System.out,
"GB2312" ));
*/
getClass().getResourceAsStream( "temp.txt" ),"GB2312" ));
owt = new BufferedWriter( new OutputStreamWriter(
System.out, "UTF-8" ));
}
catch ( IOException exc )
{
/* logger.error( exc ); */
return;
}
try
{
for ( String str; (str = rin.readLine()) != null; )
{
owt.write( str );
owt.newLine();
}
owt.flush();
}
catch ( IOException exc )
{
/* logger.error( exc ); */
}
finally
{
try
{
rin.close();
owt.close();
}
catch ( IOException exc )
{
/* logger.error( exc ); */
}
}
}
}