Re: Facing exception: Invalid byte 2 of 4-byte UTF-8 sequence.
dk wrote:
@BugBear: yeah the xml [sic] is a well formed and properly validated xml [sic].
That didn't answer his question. Answer his question.
"Have you checked that your data IS valid UTF-8 ?"
Clearly there is an improperly-encoded character in your XML file.
Find that and fix it.
@Roedy: write now I'm using ultraEdit and inserting the characters
from the ASCII table that it has. I have even tried seeing it in hex
mode and I got the same value from both the places.
ASCII != UTF-8.
That hex value for the bad character, does it match the UTF-8 code
point for that character? It's four bytes long? What character is
it, and what is the hex value you observe? (Note: that's four
questions, so there ought to be four answers.)
Meanwhile I have found something more interesting while reading the
input stream from my xml [sic] if I exclusively define it to be formatted to
UTF-8 in getByteStream it is working fine. Now here is this a Java bug
(1.5.0.12)? or something else?
It's not a Java bug.
Now this has led to a confusion. I thought ISO-8859-1 is a charset
Did you mean "encoding"?
which is subset of UTF-8. Then why didn't UTF-8 work whereas
ISO-8859-1 worked?
Because you were wrong. The two encodings differ.
If you have an assumption, let's call it an hypothesis, and the
evidence contradicts the hypothesis, then the hypothesis is wrong.
Simple.
--
Lew