Re: Facing exception: Invalid byte 2 of 4-byte UTF-8 sequence.

From:

Lew <lew@lewscanon.com>

Newsgroups:

comp.lang.java.programmer

Date:

Thu, 21 Jan 2010 11:43:13 -0800 (PST)

Message-ID:

<ee117ef7-2dad-48ae-8bd7-112db81462e6@d30g2000vbl.googlegroups.com>

dk wrote:

@BugBear: yeah the xml [sic] is a well formed and properly validated xml [sic].

That didn't answer his question. Answer his question.
"Have you checked that your data IS valid UTF-8 ?"

Clearly there is an improperly-encoded character in your XML file.
Find that and fix it.

@Roedy: write now I'm using ultraEdit and inserting the characters
from the ASCII table that it has. I have even tried seeing it in hex
mode and I got the same value from both the places.

ASCII != UTF-8.

That hex value for the bad character, does it match the UTF-8 code
point for that character? It's four bytes long? What character is
it, and what is the hex value you observe? (Note: that's four
questions, so there ought to be four answers.)

Meanwhile I have found something more interesting while reading the
input stream from my xml [sic] if I exclusively define it to be formatted to
UTF-8 in getByteStream it is working fine. Now here is this a Java bug
(1.5.0.12)? or something else?

It's not a Java bug.

Now this has led to a confusion. I thought ISO-8859-1 is a charset

Did you mean "encoding"?

which is subset of UTF-8. Then why didn't UTF-8 work whereas
ISO-8859-1 worked?

Because you were wrong. The two encodings differ.

If you have an assumption, let's call it an hypothesis, and the
evidence contradicts the hypothesis, then the hypothesis is wrong.
Simple.

--
Lew