Re: Parsing CSV files
Yes, after some time I have a parser that I like, but it has a lot of hand
coding in it. I agree that it is a matter of taste how the strings are
formed, but unfortunately, I don't have a lot of control over the input to
out program sometimes. I'm not a big fan of the \ escape thing in CSV files
since that seems odd to uninitiated users.
Not having the separator should be considered a syntax error though. That
much seems fair. We've mostly gone to XML for input and output these days
and that's solved a lot of issues, but raised a whole lot of other ones of
course.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:ef#g5K6mKHA.5692@TK2MSFTNGP04.phx.gbl...
Tom Serface wrote:
One thing most parsers don't handle correctly, that's I've seen, is
double double quotes for strings if you want to have a quote as part of
the string like:
"This is my string "Tom" that I am using", "Next token", "Next token"
In the above, from my perspective, the parser should read the entire
first string since we didn't come to a delimiter yet, but a lot of
tokenizers choke on this sort of thing.
Often, it takes two to tango. A writer needs to escape tokens in order to
reach some level of sanity. i.e, borrowing a C slash for \".
"This is my string \"Tom\" that I am using"
Or use some encoding method, each HTTP Escape! :)
The above is simple if just delimiting by comma. So watching for an
embedded comma is required. For example:
"This is my string "Tom, Hector" that I am using"
That can be easily handled if the design assumption is each field is
double quoted. The first token:
"This is my string "Tom,
does not end in double quote, so you continue with a concatenation of the
next token.
Hector" that I am using"
to complete the first field.
But overall, I found unless its really simple, it helps if you have field
type definitions known before hand.
--
HLS