Re: safely reading large files
byte8bits@gmail.com wrote:
How does C++ safely open and read very large files? For example, say I
have 1GB of physical memory and I open a 4GB file and attempt to read
it like so:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt", ios::binary);
if (myfile.is_open())
{
while (! myfile.eof() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
In particular, what if a line in the file is more than the amount of
available physical memory? What would happen? Seems getline() would
cause a crash. Is there a better way. Maybe... check amount of free
memory, then use 10% or so of that amount for the read. So if 1GB of
memory is free, then take 100MB for file IO. If only 10MB is free,
then just read 1MB at a time. Repeat this step until the file has been
read completely. Is something built into standard C++ to handle this?
Or is there a accepted way to do this?
Actually, performing operations that can lead to running out of memory
is not a simple thing at all. Yes, if you can estimate the amount of
memory you will need over what you right now want to allocate and you
know the size of available memory somehow, then you can allocate a chunk
and operate on that chunk until done and move over to the next chunk.
In the good ol' days that's how we solved large systems of linear
equations, one piece of the matrix at a time (or two if the algorithm
called for it).
Unfortunately there is no single straightforward solution. In most
cases you don't even know that you're going to run out of memory until
it's too late. You can write the program to handle those situations
using C++ exceptions. The pseudo-code might look like this:
std::size_t chunk_size = 1024*1024*1024;
MyAlgorithgm algo;
do {
try {
algo.prepare_the_operation(chunk_size);
// if I am here, the chunk_size is OK
algo.perform_the_operation();
algo.wrap_it_up();
}
catch (std::bad_alloc & e) {
chunk_size /= 2; // or any other adjustment
}
}
while (chunk_size > 1024*1024); // or some other threshold
That way if your preparation fails, you just restart it using a smaller
chunk, until you either complete the operation or your chunk is too
small and you can't really do anything...
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask