Re: Read utf-8 file return utf-16 coding hex string ?
-moonhkt wrote:.
Thank for documents for UTF-8. Actually, My company want using
ISO8859-1 database to store UTF-8 data. Currently, our EDI just handle
That statement doesn't make sense. What makes sense would be, "My company
wants to store characters with an ISO8859-1 encoding". There is not any such
thing, really, as "UTF-8 data". What there is is character data. Others
upthread have explained this; you might wish to review what people told you
about how data in a Java 'String' is always UTF-16. You read it into the
'String' using an encoding argument to the 'Reader' to understand the encoding
of the source, and you write it to the destination using whatever encoding in
the 'Writer' that you need.
ISO8859-1 codepage. We want to test import UTF-8 data. One type EDI
The term "UTF-8 data" has no meaning.
with UTF-8 Data can be import and processed loading to our database.
Then export the data to default codepage, IBM850, we found e5 87 8c
e6 99 a8 in the file. The Export file are mix ISO8859-1 chars and
UTF-8 character.
You simply map the 'String' data to the database column using JDBC. The
connection and JDBC driver handle the encoding, AIUI.
<http://java.sun.com/javase/6/docs/api/java/sql/PreparedStatement.html#setString(int,%20java.lang.String)>
The next test is loading all possible UTF-8 character to our database
then export the loaded data into a file, for compare two file. If two
different, we may be proof that loading UTF-8 into ISO8859-1 database
without any of bad effect.
There are an *awful* lot of UTF-encoded characters, over 107,000. Most are
not encodable with ISO-8859-1, which only handles 256 characters.
Our Database is Progress Database for Character mode run on AIX 5.3
Machine.
Next Task, try to build all possible UTF-8 Bit into file,for Loading
test.
Any suggestion ?
That'll be a rather large file.
Why don't you Google for character encoding and what different encodings can
handle?
Also:
<http://en.wikipedia.org/wiki/Unicode>
<http://en.wikipedia.org/wiki/ISO-8859-1>
--
Lew