Re: Refactoring question

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Fri, 18 Dec 2009 00:49:16 -0800 (PST)

Message-ID:

<50f0c3a8-c870-45e9-9768-99a20080f4fb@21g2000yqj.googlegroups.com>

On Dec 17, 7:45 pm, Brian <c...@mailvault.com> wrote:

On Dec 17, 2:39 am, James Kanze <james.ka...@gmail.com> wrote:

On Dec 16, 8:15 pm, Brian <c...@mailvault.com> wrote:

On Dec 16, 3:12 am, James Kanze <james.ka...@gmail.com> wrote:

On 15 Dec, 23:21, Brian <c...@mailvault.com> wrote:

[...]

Which do you want: to ensure that both bytes are in the same
buffer, or that the buffer is as full as possible?

I guess adding a Reserve function would be one way to address
this. I'm not sure the buffering has to be uniform, but
perhaps a Reserve function would be useful in avoiding the
check of overflow with each byte.

Or modify the "Receive" function to check after each byte (or to
copy all that fits, then a second copy for what's left). Or
specify clearly that whether buffers are full or not isn't
specified.

I just tried a version of Receive that copies all that fits and
then does a second copy for the balance. The lines marked with a
plus sign are new and are the only thing that changed.
  void
  Receive(void const* data, unsigned int dlen)
  {
    if (dlen > bufsize_ - index_) {
      memcpy(buf_ + index_, data, bufsize_ - index_); // +
      data += bufsize_ - index_; // +
      dlen -= bufsize_ - index_; // +
      SendStoredData();

      if (dlen > bufsize_) {
        PersistentWrite(data, dlen);
        return;
      }
    }
    memcpy(buf_ + index_, data, dlen);
    index_ += dlen;
  }
The resulting executable is just 200 bytes more, but the time
is over 30% slower than without the change.

That doesn't sound right. The difference is far too big.

I have retested now on Linux and tested for the first time on Windows.
On Linux I still get this large difference. I'll post links to the
source if you like. The original version takes around 16000
microseconds; the full-buffers version around 24000 microseconds. The
full-buffers version gets a warning:
"warning: pointer of type =91void *' used in arithmetic"
that the original version doesn't get.

That should be an error. Arithmetic on void* is illegal.

I'm using gcc 4.4.2 with -O3 and -std=c++0x to build the tests. The
test is marshalling a list<int32_t> with 500,000 elements to the disk.
The output file in both cases is 2,000,004 bytes.

On Windows I didn't notice a performance difference between the two
versions. The two executables are exactly the same size though. I'm
not aware of "strip" or "cmp" commands on Windows so am not 100% sure
that the two files are different.

For starters, you might be measuring differences in the way the system
caches file data. I suspect that for any larger amount of data
(enough
to significantly overflow all system caches), file access (or serial
line access) will overwhelm all timing measures. (The last time I
measured, on a Sun Sparc, a hard write to disk cost about 10 ms. And
after a couple of meg of output, all output ended up having to wait
for
a hard write. From what other people have told me, PC's are
significantly slower in this regard, and Windows buffers a lot less in
the system than Linux. But both of those statements are just hearsay,
at least coming from me.)

You have raised an interesting point, however, and if I can find the
time to get my benchmark harnesses up and running on the machines I
now
have access to (none of which I had access to three months ago), I'll
give it a try. For basic algorithms, each copying a buffer of 1024
int
(just to have something comparable): memcpy, std::copy, a simple hand
written loop, and a loop using shifting to portably ensure byte order
(and why not, std::transform with a unary function doing the
shifting).
This should give a good idea as to how well the compilers handle such
cases (because it really is an issue of compiler optimization).
Whether
any differences are important in real life, of course, depends on the
magnitude of the differences, and what else you're doing with the
data:
as I said, if you're writing really larger amounts to disk (or the
application requires synchronous writes), even the worst case of the
above probably represents less than 1% of the time.

But the real difference would be downstream: by filling every buffer
to the maximum, you need less buffers. Which means that downstream,
there are less buffers to handle.

Before adopting this approach I'd like to be sure that the
performance on Linux is not perturbed. I find the results
to be surprising, but not shocking.

I'm using a buffer of size 4096 and the only thing going into the
buffer are 4 byte integers. I also tried it with this:
if (bufsize_ - index_ > 0) {
}
around the three added lines, but that didn't help. I find this
result disappointing as philosophically I could persuade myself
that always filling up buffers makes sense. Perhaps I'll have one
configuration for files and TCP and another for UDP. Asking files
and TCP to pay for making UDP happy is unreasonable.

I'm fairly sure that you're worrying about the wrong things, and
that the difference won't be significant in a real application.

I don't think so. As Aby Warburg put it, "G-d dwells in minutiae."
I want to know why the Linux version performs poorly.

That quote has been attributed to many different people:-). In this
case, it's really a question of which details are important. You've
not
explained what measures you're taking to ensure similar cache
behavior,
for example, and similar buffering in output.

But the fact that one implementation shows significantly different
performance is interesting, and probably worth pursuing.

--
James Kanze