Re: Java Serializing in C/C++?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Fri, 29 Feb 2008 16:15:39 -0800 (PST)

Message-ID:

<c4c3ab80-4e63-497c-a4f1-5c30a85055f7@64g2000hsw.googlegroups.com>

    [I've taken the comp.lang.java.programmer out of the
    cross-posting, because my comments here are really only
    relevant to the low level accesses which can only be done in
    C++.]
On Feb 29, 11:50 pm, c...@mailvault.com wrote:

On Feb 28, 9:41 am, James Kanze <james.ka...@gmail.com> wrote:

On Feb 27, 9:27 pm, c...@mailvault.com wrote:
I don't quite see where it would make a difference. One way or
another, the application has to know the length of any given
message (supposing a message oriented protocol, of course).
Having provided that, if there is a maximum message length, then
it can (and should) reject any message over that length. I
don't see how reading the length from a header is different in
this regard from reading a message type from a header, and
knowing that that type of message has a length of n. Or even
reading the message byte by byte, and determining the end by
some sort of internal message structure. If you decide that
your code will handle messages of no more than n bytes, then any
time you get a message of more than n bytes, you reject it.

When you write "determining the end by some sort of internal
message structure," I think we are on the same page, but I
don't think it has to be read "byte by byte." (Assuming you
mean a system call when you say read.)

I do, and yes, depending on the structure, you might be able to
read it in some number of blocks, with less reads than bytes.
But for something like a variable length array of variable
length strings, you're going to need at least twice as many
reads as there are strings---in fact, a few more.

Instead of having one buffer used for both output and input,
there could be a buffer dedicated to sending data and another
buffer used only to receive data. If you happen to read more
than is needed by the current msg, the extra data should be
used to build the next msg.

If you're reading from a Unix pipe, you can't attempt a read
unless you know that all of the bytes you attempt to read will
be part of the message. If you try to read 100 bytes from a
Unix pipe, your process will block until it has actually read
100 bytes (or the pipe was closed on the write side, or you get
a signal, or a couple of other things which don't concern us
here). If the message only contains 80 more bytes, then you
will block until another message has been sent.

It's possible to determine how many bytes are in the pipe (using
stat()), but that is, again, another system call, and an
additional complication.

This approach requires more memory for the separate buffers,
but it doesn't have to calculate/send/recv the total msg
length for every msg.

It depends. Suppose the Message-Type contains a
variable length array.

You know from the msgid and version number that the
message has one high-level element - a variable length
array.

One or more high-level elements. The message might contain
several variable length arrays (and other data as well).
(Note that strings are often transmitted as a variable
length array of char. So any message which contains several
strings is likely to fall into this category.)

The way I would do it, there would be unique msg IDs for msgs
that consist of one string, two strings, or a variable number
of strings:
MsgManager
  (string) @MSGID_1
  (string, string) @MSGID_2
  (vector<string>) @MSGID_3
}
Since the vector<string> msg could handle any of those msgs, I
might get rid of the first two. If the user chooses to use
total msg lengths, the header would consist of a msg ID and
the total msg len. There wouldn't be an element count as part
of the header because the msg ID determines that. MSGID_1 and
MSGID_3 each have one high-level element. MSGID_2 has two.

But that doesn't change things, really. You now have a message
with a variable number of elements, each element having a
variable length.

Yes, but given the internal msg structure why do you need an
element- count in the msg hdr? It seems like the msg ID
wouldn't have much meaning if an element-count is also needed.

If the element contains one or more variable length arrays, just
knowing the message id isn't sufficient. (But what I was
talking about putting in the header was the message length. In
bytes. So you read the header, decode the length, and read
that. And have all of the message, and nothing but the message.)

Say the msg consists of a vector with 70 strings in it, would
the element-count be 1 or 70? K-e-n didn't say anything about
element-counts but just an element- count. Are you saying
that if the msg consisted of a vector<string> and a list<int>
there would be two element-counts in the header?

I'm saying that usually, I would expect to see the message
length in the header. Along with the message id. The message
length permits me to read all of the rest of the message in one
system request. The mesage id defines the structure of the
message. Elements in the message which consist of a variable
number of sub-elements will likely be encoded with the number of
sub-elements, possibly recursively, although other schemes are
possible; e.g. an element which is an array will start with the
number of elements in the array, and a string will start with
the number of characters in the string. (XDR would be a simple
example of a binary encoding using this scheme.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34