Re: A proposal to handle file encodings

From:

=?ISO-8859-1?Q?Arne_Vajh=F8j?= <arne@vajhoej.dk>

Newsgroups:

comp.lang.java.programmer

Date:

Thu, 22 Nov 2012 20:25:15 -0500

Message-ID:

<50aed080$0$292$14726298@news.sunsite.dk>

On 11/22/2012 4:36 PM, Roedy Green wrote:

The problem with encodings is they are not attached in any way or
embedded in any way in a file. You are just supposed to know how a
file is encoded.

Here is my idea to solve the problem.

We invent a new encoding.

Files in this encoding begin with a 0 byte, then an ASCII string
giving the name of a conventional encoding then another 0 byte.

When you read a file with this encoding, the header is invisible to
your application. When you write a file, a header for a UTF8 file gets
written automatically.

You write your app telling it to read and write this new encoding e.g.
"labeled".

It is a bad idea to have meta data in the file body. This meta data
should be where the rest of meta data are.

But even if it was moved to the file info area then I doubt
the idea is good.

It is enforcing a limitation that a text file will only have
one encoding, that limitation does not exist today.

There are practical problems:
* different systems support different encodings (sometimes
   same encoding has different name) - what should a system
   do with an unknown encoding
* there will be a huge number of legacy files without this meta
   data - what should a system do with those

And even if those problems were solved - would it really create
any benefits?

It would take many years to get such an approach approved and
widely implemented. Most likely >10 years. At that time I would
expect UTF-8 to be almost universal used for new text files.
Making this proposal obsolete.

> You can write a utility to import files into your labelled universe by
> detecting or guessing or being told the encoding.

Which just repeat the existing problems.

> It gets a header.
> Other than that the file is unmodified.

Solved much easier by using meta data.

Arne