Re: complexity for tellg()
"toton" <abirbasak@gmail.com> wrote in message
news:1171982693.542560.226510@v45g2000cwv.googlegroups.com...
.....
May be I am unable to express the problem clearly.
1) I am not writing the file, I am reading the file only. It is a text
file, but nothing is fixed like line terminator will be \n or \r\n or
\r . It all depends on who saved the file using which editor .
Then it *is* fixed, but not by you. If, as Pete Becker said, the
file was written as text on one system and read on another, the
lines might not be terminated as the reading system expects. And if
you read the file as binary, you have to know what line terminators
look like.
So this is the question for parsing ...
The file looks something like this
.X_DIM 20701
.Y_DIM 27000
.X_POINTS_PER_MM 100
.Y_POINTS_PER_MM 100
.POINTS_PER_SECOND 200
.COMMENT YES_PRES_ORG 0
.COMMENT YES_PRES_EXT 1023
.DT 3975234
.PEN_DOWN
.COMMENT .PEN_WIDTH 1
.COMMENT .PEN_WIDTH_ORG 1
.COMMENT .PEN_COLOR 0x0
Now I need to remember past position using tellg() , and go to that
position using seekg().
The cases are,
1) file is opened in text mode . The file contains \n as terminator.
seekg doesn't place file pointer to proper pos saved by tellg (as
given in my previous program ) . It works as expected when newline is
\r\n.
You're violating the Windows notion of text file, so it's possible
you're confusing the underlying C library, which the Standard C++
uses for basic file operations. Convert the file to Windows format
and seekg/tellg should work fine.
2) The file is opened in binary mode . The file contains \n as line
terminator.
seekg & tellg works as expected.
Right. No surprise.
The file contains \r\n as
terminator . the returned string contains \r , which need to be
removed.
Yep. You're now violating the C/C++ conventions for text streams
internal to a program. The \r is considered part of the text line,
not part of the line terminator.
3) This one I hadn't tested. Several mac files have \r as newline
char. What std::getline(stream,str ) will return ? The whole page or
the line only ?
The whole works, unless you specify \r as the line terminator.
Thus my questions are, how to check which newline char to use , so
that I can parse all of the files properly ?
Well, you have to know what they are, don't you? Or at least all the
possible options. One approach is to read the file as binary and be
prepared for any of \n, \n\r, \r, or \r\n as line terminators. It's
kinda hard to use getline directly that way, but you can write your
own version.
It should be noted, files are not written by me, I just read it.
And all the test's are done with MSVC 7.1 , gcc might give just
opposite result (I will check it quickly ) .
I think "opposite" is an over simplification. It's just different.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com