Re: equivalent of realloc in C++
Francis Glassborow wrote:
Boris Rasin wrote:
People who run actual tests (like Andrei here) say the savings make a
big difference. These results happen to correspond completely to
seemingly obvious conclusions considering typical heap fragmentation
and relatively huge price of copying versus practically zero price of
in-place reallocation.
But I am unhappy with the test :( It was for a very specific type of
code where the expanding vector was the last item in the address space.
It is also only relevant if there is no reasonable way for the process
to ascertain how large the vector needs to be.
Actually my tests included both vector<int> and vector<string>, which
means (in the latter case) there is an allocation of a string for each
push_back. Significant gains were measured. What I inferred happened is
that after an unsuccessful in-place allocation, there is a free block
left behind. That block is too small to be considered for future seating
of the vector, but large enough to hold many strings.
Now the problem with all this is that for most programmers, most of the
time the 'realloc' type expansion will, at best show no profit and in
some cases will be a pessimisation.
Well Francis, of all people you'd be the last I thought would rely on
"I'll just say it" as a method of making an argument :o).
There is a large amount of confirmation bias
(http://www.sciencedaily.com/articles/c/confirmation_bias.htm) in this
thread, and I've considered leaving the discussion alone more than once.
Howard Hinnant was even wiser and just didn't intervene because he has
failed before with a formal proposal, so what's the chance of convincing
anyone on the Usenet? What seems to be happening is that people start
from the notion that in-place memory expansion would be unhelpful
"because if it were someone would have introduced it in C++ by now" (a
form of trust in authority), and work back from there. Even after I and
others pointed out the falaciousness of such arguments, there was no
shortage of suppositions pulled out of thin air describing entirely
hypothetical scenarios, one less ingenious than the other.
So I did what I probably should have done earlier - sat down and wrote
some code. This first little program only loads lines from a file:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<string> vec;
while ((vec.push_back(string()), getline(cin, vec.back())))
{
}
vec.pop_back();
}
Then I wrote a program that does the same task, except it uses realloc
exclusively. I don't have access to an in-place expansion routine on
Linux (is there one?), so I wrote an implementation that is suboptimal -
it calls realloc *every single pass through the loop*. Pretty silly, eh?
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string * vec = 0;
size_t vecSize = 0;
for (;; ++vecSize)
{
vec = static_cast<string*>
(realloc(vec, (vecSize + 1) * sizeof(string)));
if (!vec) return 1;
new(vec + vecSize) string;
if (!getline(cin, vec[vecSize])) break;
}
}
This code is not well-defined because at least in theory string could
hold internal pointers, so realloc is not guaranteed to work. On Gnu it
doesn't, so the code works for the sake of measurements of interest (not
that I'd recommend it).
Now before you take a look at the numbers, I suggest you build a mental
image of what you *believe* the numbers should be. How should the two
versions compare? Then scroll down.
.... scroll down s'more ...
$ wc src.trn.sntlen
1478564 1478564 14628022 src.trn.sntlen
$ repeat 6 time ./test-realloc < src.trn.sntlen
./test-realloc < src.trn.sntlen 3.16s user 0.04s
system 99% cpu 3.203 total
../test-realloc < src.trn.sntlen 3.13s user 0.06s system 99% cpu 3.194 total
../test-realloc < src.trn.sntlen 3.14s user 0.05s system 99% cpu 3.212 total
../test-realloc < src.trn.sntlen 3.16s user 0.04s system 99% cpu 3.203 total
../test-realloc < src.trn.sntlen 3.18s user 0.06s system 99% cpu 3.239 total
../test-realloc < src.trn.sntlen 3.12s user 0.06s system 99% cpu 3.178 total
$ repeat 6 time ./test-vector < src.trn.sntlen
./test-vector < src.trn.sntlen 3.32s user 0.09s
system 100% cpu 3.408 total
../test-vector < src.trn.sntlen 3.38s user 0.05s system 100% cpu 3.427 total
../test-vector < src.trn.sntlen 3.36s user 0.05s system 99% cpu 3.430 total
../test-vector < src.trn.sntlen 3.38s user 0.06s system 99% cpu 3.443 total
../test-vector < src.trn.sntlen 3.38s user 0.03s system 99% cpu 3.413 total
../test-vector < src.trn.sntlen 3.37s user 0.06s system 100% cpu 3.436 total
The test reveals a 6.5% relative improvement in average run time of the
naive realloc-based implementation (I ignored the first run and averaged
over the next 5 runs, which yield 3.205s average runtime for realloc and
3.430 average runtime for vector). Both programs were compiled with -O3.
This in spite of the fact that the realloc version does not use any
smarts - no size doubling, no nothing. Just realloc every pass through
the loop with the new size. Also, the runtime includes the time taken by
getline in both cases, so it is likely that the relative advantage of
realloc is even larger. I did so intentionally to show that the
improvement is large enough to make itself felt in bottom line run-time.
Andrei
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]