Re: Sizes and types for network programming

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Fri, 17 Sep 2010 03:36:04 -0700 (PDT)

Message-ID:

<0e145cff-d37b-4549-9d80-07640bf9c082@w4g2000vbh.googlegroups.com>

On Sep 16, 8:27 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:

On Sep 16, 2:26 am, James Kanze <james.ka...@gmail.com> wrote:

On Sep 15, 8:43 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:

On Sep 14, 6:23 pm, Michael Hull <mikehul...@googlemail.com> wrote:

I have a question, and I hope its not obvious!
As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I'm not sure exactly what you are saying, so let me make sure it's
clear. sizeof(char) == 1, aka the size of a char is exactly one C++
byte. A C++ byte is not the same thing as a byte in other contexts.

No. C++ requires that a byte be at least 8 bits: historically,
6 and 7 bit bytes were common. (The reason Fortran uses such
a small character set, and doesn't distinguish case, is that it
was first developed on a machine with 6 bit bytes.) Also, C++
requires that an integral number of bytes occupy all of the bits
in any integral type: a PDP-10 traditionally used 5 seven bit
bytes in a 36 bit word (but the size of a byte was programmable,
so 4 nine bit bytes would work for C/C++).

Yo James. I'm not sure what's going on here. It sounds like you're
trying to correct some misinformation, but I don't understand exactly
how you're correcting me. You ripped my post apart, but I don't see
any real disagreement, nor any real corrections.

The only real misinformation was the impression your post gave,
that the "common" meaning of byte (outside of C/C++) was eight
bits. I wasn't disagreeing with your facts concerning C++, but
trying to point out that in the non-C++ context, byte is even
less precise than in C++; that your statement "In other
contexts, generally a byte is an octet, aka 8 bits" isn't quite
true---you may have said "generally", but historically, even
"generally" wouldn't hold. There are differences between the
C++ definition of byte (e.g. in C++, a byte can be the same size
as a word, and it must be at least 8 bits, neither which are
true "generally"), but your posting seemed to give the
impression that the C++ definition was somehow "looser"; that in
other contexts a byte could only be 8 bits.

For example, why say "No."? I said that sizeof(char) ==
1 always, and I said a C++ byte may not be the same thing as
byte in other contexts, like Java, common desktops, common
networking, etc.

The intent of the standard is that the C++ byte be the same
thing as a byte in other contexts. (Not all other contexts, of
course. Java redefines byte even more than C++.) There are
only two cases where this should be violated: on machines which
don't have bytes, and on machines whose natural byte size is
less than 8 bits.

On common desktops, bytes are 8 bits. Both in general use, and
in C++.

In common networking, there aren't bytes. Networking protocols
are defined in terms of octets, not bytes, precisely because
byte isn't precise enough for them.

Exactly to what
does "No." refer? You mention hardware which does not have 8 bit bytes
as though that contradicts something which I said. I don't see what
that could possibly be. Moreover, immediately following this quote in
the same post, I admit of the existence of hardware which does not
have 8 bit bytes, so I don't see the benefit of singling out my post
and adding this as though you're correcting me.

In other contexts, generally a byte is an octet, aka 8 bits.

This is the most frequent situation today. It wasn't in the
past, and the first use of byte refered to six bit bytes.

There's still one platform today where a byte is 9 bits. And
I think some of the embedded processors punt, and make a byte 32
bits (and sizeof(int) 1); this doesn't correspond to the
classical definition, however, which requires that a byte be
smaller than a word.

Indeed. Would you be happier if I said "in other contexts /today/,
generally a byte is an octet, aka 8 bits."?

Not really, unless you defined the contexts. There are still
machines being sold today with 9 bit bytes.

Moreover, in the very next sentence of my first post in the thread
(quoted below), I mention that there is hardware, (today) exotic,
which does not have 8 bit bytes.

According to the C++ standard, and some exotic hardware
perhaps, a C++ byte may be 8 bits, 9 bits, 64 bits, etc. Thus,
the effective size of char is also implementation dependent.
Ex: you can't serialize a 64 bit char to a data stream, then
deserialize those 64 bits of information into an 8 bit char on
a regular desktop.

Sure you can (and people do). It just requires some special
handling.

No, you cannot take the arbitrary information in a 64 bit char on one
system and shove that into an 8 bit char on another system, which is
exactly what I said. However, yes you can do serialization between
separate hardware which has differently sized bytes, which I did not
deny.

You cannot take arbitrary information on one system, and just
shove it into some arbitrary data type on another system.
That's true even if the size of byte is the same on both
systems. You certainly can serialize a 64 bit char in a way
that it can be read, without loss of information, on a system
with 8 bit char.

--
James Kanze