Re: Reading an array from file?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 10 Aug 2009 01:57:17 -0700 (PDT)

Message-ID:

<4f75c139-2e29-4870-8df3-00ee8995cdea@j9g2000vbp.googlegroups.com>

On Aug 9, 6:41 am, Jerry Coffin <jerryvcof...@yahoo.com> wrote:

In article <bdbbf8aa-5da2-443a-bd20-98969e1b7633
@v2g2000vbb.googlegroups.com>, james.ka...@gmail.com says...
[ Figuring out valid names on a server ]

Not when you're creating new files. And most of my programs
don't run under a GUI; they're servers, which run 24 hours a
day. Of course, they don't run under Windows either, so the
question is moot:-). But the question remains---picking up the
name from a GUI is fine for interactive programs, but a lot of
programs aren't interactive.

Even when the main program isn't interactive, configuration
for it can be.

Ultimately you're right though -- it would be nice if you
could depend on (for example) being able to query a server
about some basic characteristics of a shared/exported file
system, so you could portably figure out what it allows. Right
now, virtually all such "knowledge" is encoded implicitly in
client code (or simply doesn't exist -- the client just passes
a string through and hopes for the best).

The problem is: there is a more or less guaranteed minimum that
will be portable, say one to six alphanumeric characters, a dot,
and a single alphabetic character. And no directory paths.
That's awfully restrictive, however, and you almost never need
that extreme. Trying to define just how far you can deviate
from it, however, is not always obvious; in today's world, I
tend to suppose two strings which would be valid C/C++ symbols,
separated by a '.', and a maximum of 14 characters, including
the '.'. But it's more complex than that, because some systems
ignore case, others don't, and some treat the text after the dot
special.

[ ... ]

And that the file name matches, somehow. But typically, this
isn't the case---I regularly share files between systems, and
this seems to be the case for everyone where I work.

I wish I could offer something positive here, but I doubt I
can. Ultimately, this depends more on the FS than the OS
though -- just for example, regardless of the OS, an ISO 9660
FS (absent something like Joliet extensions) places draconian
restrictions on file names.

Yes. Posix defined a function pathconf, which allows obtaining
some file system specific information, but it's still very Posix
oriented---it only returns information about things which can
vary on a Posix filesystem. Where as in practice, the problems
I encounter are because I'm using both Windows and various Unix.

[ ... ]

I agree that standard (and simple) solutions exist. Putting
a BOM at the start of a text file allows immediate
identification of the encoding format. But how many editors
that you know actually do this?

A few -- Windows Notepad knows how to create and work with
UTF-16LE, UTF-16BE, all including BOMs (or whatever you call
the UTF-8 signature).

I'd call it a BOM:-). The idea being that if the reader knows
(or can reasonably assume) it is dealing with Unicode, writing
0xFEFF as the first character allows the application to
determine the transformation format being used by simply reading
the first couple of bytes (maximum four). (I tend to "ignore"
byte order, as such, and simply think in terms of transformation
format---although the only difference between some
transformation formats is the byte order.) Of course, if the
reader can assume Unicode, you don't need a BOM for UTF-8: the
BOM in the other transformation formats ensures that one of the
first four bytes will be 0xFF, and another 0xFE, neither of
which can be present in UTF-8.

In practice, of course, there's still a lot of non-Unicode
floating around as well, and not all Unicode files contain a
BOM, so things get more complicated. Even when limiting myself
to Unicode, I'll read the first four bytes---if there's a BOM,
fine, but even if there's not, I'll look for 0x00 bytes, if the
position and number correspond to one of the UTF-16 or UTF-32
formats, assuming the first two characters have a Unicode
encoding of less than 0xFF, I'll assume that format. It's not
guaranteed, and will almost certainly fail if I get a file with
Chinese text and no BOM, but it works often enough to be
worthwhile. At least in my environment (where files with
Chinese text are very rare).

The current version of Visual Studio also seems to work fine
with UTF-8 and UTF-16 (BE & LE) text files as well. It
preserves the BOM and endianess when saving a modified version
-- but if you want to use it to create a new file with
UTF-16BE encoding (for example) that might be a bit more
difficult (I haven't tried to very hard, but I don't
immediately see a "Unicode big endian" option like Notepad
provides).

I'm afraid I can't help you there. (Now if it were vim...)
But it sounds like Microsoft is being inconsistent.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34