Re: Merging Iterator Design

From:

Greg Herlihy <greghe@pacbell.net>

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 17 Mar 2007 01:42:46 CST

Message-ID:

<C2209C14.5014%greghe@pacbell.net>

On 3/15/07 3:44 PM, in article gmsnc4-8o3.ln1@satorlaser.homedns.org,
"Ulrich Eckhardt" <eckhardt@satorlaser.com> wrote:

Sebastian Redl wrote:

I'm working on an iterator that encodes and decodes endianness on the fly.
The basic idea is very simple: there is a base iterator whose value_type
is some octet type (e.g. uint8_t). The endian iterator wraps this base
iterator and, upon reading, takes the appropriate number of octets and
merges them into its own value type (typically uint16_t or uint32_t). Upon
writing, the iterator splits the value up and writes the individual octets
to the correct position.

There are some issues with this.

[...]

The second issue is that of range size. To work correctly, the size of the
underlying range must be a multiple of the number of octets in the endian
iterator's value type.

[...]

One way to solve this issue is to simply make proper size a requirement
and declare everything else undefined behaviour.

I'd go for this solution. In practice, any such faulty use would be detected
by the underlying iterator, assuming it uses a checked implementation. All
major C++ standardlibraries have such a mode.

The interface is at fault here for not accurately describing its
requirements. Specifically, the interface declares that it accepts a
sequence of single-byte values as its input, but in reality it requires a
sequence of multi-byte values in order to produce correct, byte-swapped
output.

This discrepancy also means that the interface is not "type-safe." A
type-safe version of this interface would accept only multi-byte value
sequences and would not accept single-byte value sequences (as it does
currently) only to wind up blindly scrambling their order.

I'd also do it this way because it is hard to handle otherwise. Should the
sequence end when the last input element or when the last output element
are consumed? One will leave elements in the input sequence, the other will
use an incomplete output element. Both are conditions that should be
checked before. This is also mostly in the spirit of the STL to not
sacrifice performance (i.e. additional checks) when they aren't necessary.

There is no reasonable way out of any impossible situation; and being left
with an odd byte after processing a sequence of multi-byte values is (as the
original poster noted) not a defined state according to the actual
constraints of the implementation. But the implementation itself is to blame
for its predicament. The implementation claimed to accept single-byte value
sequences instead of the multi-byte sequence it actually required (and then
relied on the caller to go along with this charade). A type-safe
implementation would eliminate such subterfuge - and would require no
complicity from its callers: Because the very types that a type-safe C++
routine asks for - are the very same types that it can handle. And every one
of its caller may count on that fact.

Greg

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]