Re: Merging Iterator Design

From:

Greg Herlihy <greghe@pacbell.net>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 22 Mar 2007 21:28:23 CST

Message-ID:

<C2286B84.5570%greghe@pacbell.net>

On 3/21/07 8:17 PM, in article
Pine.GHP.4.58.0703212356200.24771@stud3.tuwien.ac.at, "Sebastian Redl"
<e0226430@stud3.tuwien.ac.at> wrote:

On Sat, 17 Mar 2007, Greg Herlihy wrote:

On 3/15/07 3:44 PM, in article gmsnc4-8o3.ln1@satorlaser.homedns.org,
"Ulrich Eckhardt" <eckhardt@satorlaser.com> wrote:

Sebastian Redl wrote:

One way to solve this issue is to simply make proper size a requirement
and declare everything else undefined behaviour.

I'd go for this solution. In practice, any such faulty use would be detected
by the underlying iterator, assuming it uses a checked implementation. All
major C++ standardlibraries have such a mode.

The interface is at fault here for not accurately describing its
requirements. Specifically, the interface declares that it accepts a
sequence of single-byte values as its input, but in reality it requires a
sequence of multi-byte values in order to produce correct, byte-swapped
output.

This discrepancy also means that the interface is not "type-safe." A
type-safe version of this interface would accept only multi-byte value
sequences and would not accept single-byte value sequences (as it does
currently) only to wind up blindly scrambling their order.

I'm not sure I understand. Converting a stream of raw bytes into a stream
of larger values by combining those bytes together is inherently not
typesafe. It is a coercion, like a cast.

A "type safe" interface is one in which the types as specified by the
interface provide all of the information needed for the implementation to
produce the expected output. So it is perfectly possible to convert a
sequence of one byte values into a sequence of multi-byte values and to do
so in a type-safe manner. The interface that performs such a conversion
simply has to specify a single-byte type sequence as input, and a multibyte
type sequence as output.

By the same principle, a type-safe routine to change the "endianness" of a
sequence of integer values, must specify the type of integers being input
(including their size and signedness) and and must specify exactly the same
type for its output.

So a "byte swapping" routine that accepts single-byte integer types is
either pointless or it is not typesafe because the information conveyed by
the input type is not enough to produce the expected output. For example,
should the sequence be treated as two-byte unsigned values or four byte
signed integer values? There is no way to know based on the available type
information.

The implementation claimed to accept single-byte value
sequences instead of the multi-byte sequence it actually required (and then
relied on the caller to go along with this charade). A type-safe
implementation would eliminate such subterfuge - and would require no
complicity from its callers: Because the very types that a type-safe C++
routine asks for - are the very same types that it can handle. And every one
of its caller may count on that fact.

So how would I make the system type-safe? What input would I require?
Iterators with a value_type of char[2]?

The iterator should derefence into an integer type - the size and signedness
of which is specified. The POSIX header file contains suitable typedefs that
would work for this purpose: uint32_t, int64_t among many others.

What does this do to usability? How would the user get such iterators from
byte streams? (Byte streams are what the user initially has, no way around
that.) How would I make this conversion (char -> char[2]) type-safe and
guarded against odd numbers of bytes?

It seems I can only move the issue somewhere else, not get rid of it.

Presumably, the data originated as a sequence of multi-byte values and were
serialized into a byte stream for transport or for storage. If that is the
case, then there are two separate operations that need to be performed: to
unserialize the byte stream to reconstitute the original multi-byte values
and then to transform those values by transposing their bytes in a uniform
manner.

Now, of course it is possible to unserialize a byte stream in such a way
that that the reconstituted values wind up with the desired "endianness" -
but combining two operations that really have nothing to do with each other
strikes me as a false economy. In my opinion, a more flexible, less
complicated and ultimately more robust approach would be to have the
"deserializer" accept a byte stream and produce a stream of multi-byte
values, and have the "byte-swapper" accept a sequence of multi-byte values
and produce a corresponding sequence of byte-swapped values the same type as
its input.

Greg

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]