Re: What influences C++ I/O performance?
In article <e0cd98eb-6dc0-4f4f-8780-0354a4556868
@h11g2000prf.googlegroups.com>, szhorvat@gmail.com says...
[ ... ]
This is the relevant portion of the code:
I suspect you've done a bit of unintentional editing while separating
out the relevant portion of the code.
std::ios::sync_with_stdio(false);
std::ifstream infile(path);
input_array arr(); // see note below
This is NOT a definition of an input_array named arr. Rather, it's a
declaration of a function named arr that takes no parameters, and
returns an input_array. Until this is fixed, I don't think the rest of
the code can even compile.
As far as speed of I/O goes, I'm having difficulty reproducing the
problem you cite. I started by writing a small program to generate a
test file of the size you cited:
#include <iostream>
char filename[] = "c:\\c\\source\\junk.mat";
int main() {
std::ofstream out(filename);
for (int row=0; row<100; row++) {
for (int col=0; col<25000; col++)
out << row + col << "\t";
out << "\n";
}
return 0;
}
Then I rewrote your code a bit so I could compile it:
std::vector<int> read_matrix(std::string const &path) {
std::ifstream infile(path.c_str());
std::vector<int> arr;
int row = 0, max_col = 0;
std::string line;
while (std::getline(infile, line)) {
std::istringstream ln(line);
int number;
int col = 0;
while (ln >> number) {
arr.push_back(number);
col++;
}
if (col != 0) {
if (row == 0)
max_col = col;
if (col != max_col) {
/*
* Array is not rectangular.
* Handle error (exit).
*/
}
row++;
}
}
return arr;
}
Finally, I added a test main to call that and read the file:
char filename[] = "c:\\c\\source\\junk.mat";
#include <iostream>
#include <numeric>
#include <time.h>
int main() {
clock_t start = clock();
std::vector<int> r = read_matrix(filename);
clock_t stop = clock();
// to ensure against the file-read being optimized away,
// doing something to use what we read.
int sum = std::accumulate(r.begin(), r.end(), 0);
std::cout << "sum = " << sum << "\n";
std::cout << "time = " << double(stop - start) / CLOCKS_PER_SEC;
return 0;
}
Run times:
VC++ 7.1: 1.89
Comeau 4.3.3: 2.672
g++ 3.4.4: 4.671
That leaves a few possibilities:
1) the newer version of VC++ has reduced I/O speed a lot.
2) the newer version of g++ has improved I/O speed a lot.
3) the software differences are being hidden by hardware differences.
4) you're not getting what you think from a "release" build in VS 2008.
Of these, the first seems possible but fairly unlikely (they both use
the Dinkumware library, and I don't think it's changed all that much
between these versions).
The second seems possible, but not to the degree necessary to explain
what you've observed. In particular, the executable I get from VC++ 7.1
reads the data quite a bit faster than 1/5th the claimed speed of my
hard drive, so speeding it up by 5:1 shouldn't be possible. Of course,
there could be differences due to caching (e.g. a second run might read
from the cache much faster than the hard drive can support), but I at
least attempted to factor this out in my testing, so while it could have
contributed something, I can't find any indication that it would account
for any large differences.
I'd guess the third is probably true to some degree -- but, again, I
don't see anything that would account for the major differences we're
seeing.
To me, that leaves the last possibility as being by far the most likely.
Of course, there may also be some entirely different possibility that
hasn't occured to me.
--
Later,
Jerry.
The universe is a figment of its own imagination.