Re: complexity for tellg()
On Feb 20, 7:23 pm, "P.J. Plauger" <p...@dinkumware.com> wrote:
"toton" <abirba...@gmail.com> wrote in message
news:1171980085.871968.153450@h3g2000cwc.googlegroups.com...
On Feb 20, 11:46 am, "Alf P. Steinbach" <a...@start.no> wrote:
* toton:
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )
can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1
Most likely the cause is conversion of CRLF to LF, which you've
specified by (1) opening the file in text mode and (2) compiling with a
Windows compiler.
One cure could then be to open the file in binary mode, and handle
newlines as appropriate (or not).
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
There are enough bad things related to new line ...
seekg and tellg doesn't match when newline char is \n , and file is
opened in text mode.
That shouldn't be, if you're just using seekg to return to a place
earlier memorized by tellg.
For the unix file,
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
std::cout<<pos<<" "<<line<<std::endl;
if(line==".PEN_DOWN"){
in.seekg(pos);
break;
}
}
std::getline(in,line);///This doesn't print .PEN_DOWN !
std::cout<<line<<std::endl;
Now if I open it in binary mode, Then this problem is solved.
But it creates another set of problems,
for unix file now it is fine, but for windows file \r is attached at
the end of line, as newline char is \n. So I need to remove \r from
the line if it is present.
If you wrote the file in binary mode, the \r characters wouldn't
be appended in the first place. It is important that you read and
write consistently, at least if you don't want to deal with local
conventions for reading and writing text files.
I wonder, what will getline will return in case of a mac file where
newline terminator is \r only. Will it return the total file as single
line ?
If you write in text mode and read in binary mode, that could happen,
yes.
Is there any std api support to take care of all these things, and yet
to make seekg & tellg consistent ?
Yes, it's called the Standard C++ library, if you use it right.
P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com
May be I am unable to express the problem clearly.
1) I am not writing the file, I am reading the file only. It is a text
file, but nothing is fixed like line terminator will be \n or \r\n or
\r . It all depends on who saved the file using which editor .
So this is the question for parsing ...
The file looks something like this
..X_DIM 20701
..Y_DIM 27000
..X_POINTS_PER_MM 100
..Y_POINTS_PER_MM 100
..POINTS_PER_SECOND 200
..COMMENT YES_PRES_ORG 0
..COMMENT YES_PRES_EXT 1023
..DT 3975234
..PEN_DOWN
..COMMENT .PEN_WIDTH 1
..COMMENT .PEN_WIDTH_ORG 1
..COMMENT .PEN_COLOR 0x0
Now I need to remember past position using tellg() , and go to that
position using seekg().
The cases are,
1) file is opened in text mode . The file contains \n as terminator.
seekg doesn't place file pointer to proper pos saved by tellg (as
given in my previous program ) . It works as expected when newline is
\r\n.
2) The file is opened in binary mode . The file contains \n as line
terminator.
seekg & tellg works as expected. The file contains \r\n as
terminator . the returned string contains \r , which need to be
removed.
3) This one I hadn't tested. Several mac files have \r as newline
char. What std::getline(stream,str ) will return ? The whole page or
the line only ?
Thus my questions are, how to check which newline char to use , so
that I can parse all of the files properly ?
It should be noted, files are not written by me, I just read it.
And all the test's are done with MSVC 7.1 , gcc might give just
opposite result (I will check it quickly ) .