Re: Refactoring question

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 16 Dec 2009 01:12:52 -0800 (PST)

Message-ID:

<4f4427b8-bee7-440a-a011-bcbac7032eb4@r1g2000vbp.googlegroups.com>

On 15 Dec, 23:21, Brian <c...@mailvault.com> wrote:

On Dec 15, 2:49 am, James Kanze <james.ka...@gmail.com> wrote:

On Dec 15, 3:56 am, Brian <c...@mailvault.com> wrote:

I'm a little bit confused by your names. Are you receiving
(getting) or sending (putting)?

  void
  Receive(uint16_t value)
  {
    if (thisFormat_ == otherFormat_) {
      Receive(&value, sizeof(value));
    } else {
      if (least_significant_first == otherFormat_) {
        Put( (value ) & 0xFF );
        Put( (value >> 8) & 0xFF );
      } else {
        Put( (value >> 8) & 0xFF );
        Put( (value ) & 0xFF );
      }
    }
  }

I'd expose a little bit more of the buffer details here, perhaps
with a "reserve" function, so that you don't have to test if there's
place for each byte. Note, in fact, that your bufferization is
different if the formats are the same.

That's true, but I'm not sure it matters.

It depends on what you're doing with the data later. If you're
writing
it to disk, it doesn't matter. If you're sending it in packets on the
line, it depends on the protocol involved, but it could matter a lot.

Which do you want: to ensure that both bytes are in the same buffer,
or that the buffer is as full as possible?

I guess adding a Reserve function would be one way to address this.
I'm not sure the buffering has to be uniform, but perhaps a Reserve
function would be useful in avoiding the check of overflow with each
byte.

Or modify the "Receive" function to check after each byte (or to copy
all that fits, then a second copy for what's left). Or specify
clearly
that whether buffers are full or not isn't specified.

Having decided the buffering strategy, there's no point in the first
if. It just adds complexity, for no gain.

I think that if statement helps performance-wise.
Here's some text from my site:

"With -O3 and 500,000 elements, the Boost Serialization version is
between 2.3 and 2.8 times slower than the Ebenezer version... When no
formatting is needed because the reader has the same format as the
writer, the numbers are near the high end, (2.8), of the range. When
formatting is needed because the two machines have different byte
orders, the numbers are near the low end of the range." That is found
on this page --http://webEbenezer.net/comparison.html.

I'm not familiar with the Boost serialization, so I couldn't say. I
would hope that it achieves total portability, including support for
non-two's complement. Which does add to the cost.

The test I'm describing is writing data to the hard drive.
I pretend in the one case that the formats are different and
measure the time both ways.

And I'd definitly call this function Send, and not Receive.

I picked up the term "put" from some your previous posts I
think. Well, the buffer is receiving data in order to send
it. That's why I call it Receive.

The function is pushing data out, which is why I would call it Send.
(The client code calls it to send data, not to receive it.)

[...]

Again, the two branches of the if implement different buffering
strategies. Before going any further, I think you have to define
this.

I know, but so what?

If you think it doesn't matter, then define it as unspecified. I'm
not
sure that I like the idea that an integer might span two buffers, but
whether it matters really depends on what happens with the buffers
later.

The second form is needed for correctness and both forms put the same
amount of information onto the stream and the ints themselves are in
the same sequence (but with different byte order) in either case. In
"Effective TCP/IP Programming" it says, "There is no such thing as a
'packet' for a TCP application. An application with a design that
depends in any way on how TCP packetizes the data needs to be
rethought."

That's TCP. Applications don't talk to one another in TCP; they use
some higher level protocol. (And of course, they may also use other
protocols, like UDP, for the lower level.)

[...]

Any decisions concerning data format should be negotiated in the
connection protocol. They shouldn't be evaluated on the fly, nor
change during a connection.

The decisions are determined in the connection protocol. I've thought
about making this a template parameter, but didn't like what I was
coming up with when I looked at that. The thisFormat_ isn't changable
after the buffer is constructed. The otherFormat_ is but that is to
permit the same buffer to be used to handle requests from both little
and big endian users. The way things are set up now, it is possible
that a programming error could lead to incorrectly setting the
otherFormat_ in the middle of a connection, but that doesn't seem like
a likely problem to me.

Have you considered the possibility of using the strategy pattern.
It's
possible that a virtual function call could be cheaper than all your
if's, and the resulting code would certainly be more readable.

[...]

I think you have to define a higher level protocol to begin with.
(And although it's possible, and I've seen at least some formats
which do so, I'm not convinced that there's any advantage in
supported different representations.)

I've no idea what you mean by that last sentence.

The impression I have here (but I don't see the entire context---only
what you've posted) is that you're putting the cart before the horse.
Before writing a single line of code, you should specify the protocol,
exactly. From what you seem to be doing, you've got in mind a
protocol
which supports two different representations for integers, and at
least
two for floats. I've seen some protocols which do this, but I'm not
convinced there's any advantage in doing so. (I suspect that most
protocols which support several different representations do so for
historical reasons. They were initially disk based, and read and
wrote
a binary image. Only later, when they attempted to read an image read
on a different machine was the problem realized and addressed. In a
way
that didn't break any existing files.)

--
James Kanze