Re: What influences C++ I/O performance?

From:

SzH <szhorvat@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 16 Feb 2008 12:14:51 -0800 (PST)

Message-ID:

<0ab81bd1-1e5a-43eb-8a7f-bfc9d0148d5c@28g2000hsw.googlegroups.com>

Hi Jerry,

Thanks for the reply! Hadn't it been for Google Groups, I wouldn't
have noticed it now, two weeks later.

On Feb 16, 6:17 pm, Jerry Coffin <jcof...@taeus.com> wrote:

In article <e0cd98eb-6dc0-4f4f-8780-0354a4556868
@h11g2000prf.googlegroups.com>, szhor...@gmail.com says...

[ ... ]

This is the relevant portion of the code:

I suspect you've done a bit of unintentional editing while separating
out the relevant portion of the code.

std::ios::sync_with_stdio(false);
std::ifstream infile(path);

input_array arr(); // see note below

This is NOT a definition of an input_array named arr. Rather, it's a
declaration of a function named arr that takes no parameters, and
returns an input_array. Until this is fixed, I don't think the rest of
the code can even compile.

Yes, you're right, it used to be input_array arr(1000 /* some number
here */);
I remove the number but not the parens.

As far as speed of I/O goes, I'm having difficulty reproducing the
problem you cite. I started by writing a small program to generate a
test file of the size you cited:

#include <iostream>

char filename[] = "c:\\c\\source\\junk.mat";

int main() {
        std::ofstream out(filename);

        for (int row=0; row<100; row++) {
                for (int col=0; col<25000; col++)
                        out << row + col << "\t";
                out << "\n";
        }
        return 0;

}

The matrix I used was actually 10 times bigger. Here's a program that
generates something that resembles closely what I actually had. It
should generate a file > 150 MB.

#include <fstream>
#include <cstdlib>

using namespace std;

int myrand() {
    return int( 250000.0*rand()/(RAND_MAX + 1.0) );
}

int main() {
    ofstream out("junk.mat");

    for (int row=0; row < 100; row++) {
        for (int col=0; col < 250000; col++)
            out << myrand() << '\t';
        out << '\n';
    }
    return 0;
}

Then I rewrote your code a bit so I could compile it:

std::vector<int> read_matrix(std::string const &path) {

        std::ifstream infile(path.c_str());

        std::vector<int> arr;

        int row = 0, max_col = 0;

        std::string line;
        while (std::getline(infile, line)) {
            std::istringstream ln(line);
            int number;
            int col = 0;
            while (ln >> number) {
                        arr.push_back(number);
                        col++;
            }
            if (col != 0) {
                        if (row == 0)
                                max_col = col;
                        if (col != max_col) {
                                /*
                                 * Array is not rectangular.
                                 * Handle error (exit).
                                 */
                        }
                        row++;
            }
        }

        return arr;

}

Finally, I added a test main to call that and read the file:

char filename[] = "c:\\c\\source\\junk.mat";

#include <iostream>
#include <numeric>
#include <time.h>

int main() {

        clock_t start = clock();
        std::vector<int> r = read_matrix(filename);
        clock_t stop = clock();

        // to ensure against the file-read being optimized away,
        // doing something to use what we read.
        int sum = std::accumulate(r.begin(), r.end(), 0);
        std::cout << "sum = " << sum << "\n";
        std::cout << "time = " << double(stop - start) / CLOCKS_PER_SEC;

        return 0;

}

Run times:

VC++ 7.1: 1.89
Comeau 4.3.3: 2.672
g++ 3.4.4: 4.671

That leaves a few possibilities:

1) the newer version of VC++ has reduced I/O speed a lot.

I only have VS 2008 Express, so I don't know, but it seems possible.

2) the newer version of g++ has improved I/O speed a lot.

Yes, it did. I tested it with gcc 3.4.5 (mingw), and the program
compiled with the newer version (gcc 4.2.1, dw2 version from mingw) is
more than 2x (twice) faster than the one compiled with old gcc. (!!)

3) the software differences are being hidden by hardware differences.

This is unlikely. As I'll show below, the code generated by VS is CPU-
bound (not disk-bound).

4) you're not getting what you think from a "release" build in VS 2008.

This is unlikely because the program compiled in "debug" mode ran
twice slower than when compiled in "release" mode. But to keep things
reproducible, below I'll show the exact commands used for compiling.

Here's a simplified, almost minimal version of the program I used for
testing. This one can be compiled without additional prerequisites
(like input_array in the original example). Just copy and paste it
into a file.

/* perf.cpp */

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

using namespace std;

int main(int argc, char *argv[]) {
    if (argc != 2) {
        exit(EXIT_FAILURE);
    }

    ios::sync_with_stdio(false);

    ifstream infile(argv[1]);

    string line;
    while (getline(infile, line)) {
        // uncomment this for the second test
        /*
        std::istringstream ln(line);
        int number;
        int col = 0;
        while (ln >> number)
            col++;
        */
        /* Just to make sure that the
         * compiler doesn't optimize anything away,
         * output the line length:
         */
        cout << line.size() << '\n';
    }

    return 0;
}

First, let's just read in every line as strings (i.e. keep the section
that reads the numbers commented).
The timings are (in seconds):

vs: 7.311
gcc: 2.015
dmc: 4.218 (digital mars compiler, v8.50, with STLport)

You'll find the full (unedited) transcript of the command line session
below, complete with compilation flags.

Now let's uncomment the section that reads the numbers, and measure
again. Now the results are:

vs: 51.165
gcc: 8.795
dmc: 18.560

For comparison (not shown in transcript):

gcc 3.4.5 : 23.637
VS without the /O2 options: 1:58.754

Jerry, the results are the same with your version of the program too
(the timings are a little higher, probably because of need to "grow"
the std::vector).

Note that the timings are much higher in the second test. This
suggests that most of the time is spent with parsing the numbers, and
not reading the data from the disk. But the huge difference between VC
++ and gcc persists ... The only reasonable explanation to me is that
VC++'s streams implementation is very slow and gcc's is very fast.
But why is then such a huge difference between my and Jerry's
results? Did VC++ get really this slower since 7.1? Or is the
Express edition crippled in some way?

Or did I do something stupid here? I'd be happiest if someone could
point out some mistake I made, because I'd like to make this work fast
(on the VS forums no one did).

************ Transcript, just read the lines, don't parse numbers
**************

C:\work\temp\perf>cl /O2 /EHsc perf.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08
for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.

perf.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

/out:perf.exe
perf.obj

C:\work\temp\perf>g++ --version
g++ (GCC) 4.2.1-dw2 (mingw32-2)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

C:\work\temp\perf>g++ -O2 perf.cpp -o perfgcc.exe

C:\work\temp\perf>dmc -o+all perf.cpp -o perfdmc.exe
link perf,perfdmc,,user32+kernel32/noi;

C:\work\temp\perf>timethis "perf junk.mat > NUL"

TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:15 2008

TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:15 2008
TimeThis : End Time : Sat Feb 16 20:37:22 2008
TimeThis : Elapsed Time : 00:00:07.311

C:\work\temp\perf>timethis "perfgcc junk.mat > NUL"

TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:29 2008

TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:29 2008
TimeThis : End Time : Sat Feb 16 20:37:31 2008
TimeThis : Elapsed Time : 00:00:02.015

C:\work\temp\perf>timethis "perfdmc junk.mat > NUL"

TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:35 2008

TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:35 2008
TimeThis : End Time : Sat Feb 16 20:37:39 2008
TimeThis : Elapsed Time : 00:00:04.218

************ Transcript, parse the numbers too **************

C:\work\temp\perf>cl /O2 /EHsc perf.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08
for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.

perf.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

/out:perf.exe
perf.obj

C:\work\temp\perf>g++ -O2 perf.cpp -o perfgcc.exe

C:\work\temp\perf>dmc -o+all perf.cpp -o perfdmc.exe
link perf,perfdmc,,user32+kernel32/noi;

C:\work\temp\perf>timethis "perf junk.mat > NUL"

TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:42:00 2008

TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:42:00 2008
TimeThis : End Time : Sat Feb 16 20:42:51 2008
TimeThis : Elapsed Time : 00:00:51.165

C:\work\temp\perf>timethis "perfgcc junk.mat > NUL"

TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:04 2008

TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:04 2008
TimeThis : End Time : Sat Feb 16 20:43:12 2008
TimeThis : Elapsed Time : 00:00:08.795

C:\work\temp\perf>timethis "perfdmc junk.mat > NUL"

TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:17 2008

TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:17 2008
TimeThis : End Time : Sat Feb 16 20:43:35 2008
TimeThis : Elapsed Time : 00:00:18.560

The test was done on an 1.7 GHz Pentium-M on WinXP.

Of these, the first seems possible but fairly unlikely (they both use
the Dinkumware library, and I don't think it's changed all that much
between these versions).

The second seems possible, but not to the degree necessary to explain
what you've observed. In particular, the executable I get from VC++ 7.1
reads the data quite a bit faster than 1/5th the claimed speed of my
hard drive, so speeding it up by 5:1 shouldn't be possible. Of course,
there could be differences due to caching (e.g. a second run might read
from the cache much faster than the hard drive can support), but I at
least attempted to factor this out in my testing, so while it could have
contributed something, I can't find any indication that it would account
for any large differences.

I'd guess the third is probably true to some degree -- but, again, I
don't see anything that would account for the major differences we're
seeing.

To me, that leaves the last possibility as being by far the most likely.
Of course, there may also be some entirely different possibility that
hasn't occured to me.

--
Later,
Jerry.

The universe is a figment of its own imagination.

P.S. I'm CCing you in case you sent the message two weeks ago, but it
arrived only now.