Re: how to encode a float in base64?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 18 Mar 2008 02:41:30 -0700 (PDT)

Message-ID:

<32a908c8-1adc-4c95-a872-9213b945f02a@t54g2000hsg.googlegroups.com>

On Mar 17, 8:09 pm, Paul Brettschneider <paul.brettschnei...@yahoo.fr>
wrote:

James Kanze wrote:

On Mar 17, 9:01 am, Paul Brettschneider <paul.brettschnei...@yahoo.fr>
wrote:

Jack Klein wrote:

[...]

Then you can memcpy() the float into the beginning of the
buffer. Then you can encode the 6 byte buffer.

Isn't using unions the idiomatic thing to do in this case?

It's formally undefined behavior (although I suspect that most
compilers support it). The standard sanctionned way is with
reinterpret_cast, but this is not without problems, and is
probably less portable than the union in practice. memcpy is
guaranteed to work, everywhere (and also avoids any alignment
issues which might otherwise crop up).

Strange, I thought the memcpy() and casts have the problem of
strict aliasing.

According to the standards (C and C++), you're allowed to access
any object as an array of unsigned char (or any char type in
C++). Which means that strict aliasing can't be applied as soon
as one of the pointers involved is a char type.

In practice, I think some compilers overlook this point. (I see
to recall hearing the g++ was one of them.) As I mentions, for
whatever reasons, and regardless of what the standard says, I
think the use of a union is probably somewhat more portable that
the use of reinterpret_cast here, although it wouldn't surprise
me if either caused problems with some compilers. (On the other
hand, I've cast

The issue with memcpy is different: you've let a pointer escape
to a function. Either the compiler knows the semantics of
memcpy somehow, in which case, it knows that it's modifying your
object, or it doesn't, in which case, it has to assume that it
might modify your object. Something like:
    float f ;
    void* p = &f ;
    float* pf = static_cast< float* >( p ) ;
    *pf = ...
is definitly legal, well defined, and actually not that uncommon
in C code. And the compiler must assume that if you pass the
address of a float to memcpy (converting it implicitly to
void*), that memcpy might do something like the last two lines
internally. And unlike the case of reinterpret_cast to unsigned
char*, I've never heard of a compiler getting this one wrong.

My xdrstream's make no use of aliasing whatsoever, using
something like:

    bool isNeg = source < 0 ;
    if ( isNeg ) {
        source = - source ;
    }
    int exp ;
    if ( source == 0.0 ) {
        exp = 0 ;
    } else {
        source = ldexp( frexp( source, &exp ), 24 ) ;
        exp += 126 ;
    }
    unsigned long mant = source ;
    dest.put( (isNeg ? 0x80 : 0x00) | exp >> 1 ) ;
    dest.put( ((exp << 7) & 0x80) | ((mant >> 16) & 0x7F) ) ;
    dest.put( mant >> 8 ) ;
    dest.put( mant ) ;

to output a float. In the end, I think it's the only way to be
100% sure. (But I suspect that it may have an unacceptable
performance hit in some cases, although at least on a Sun Sparc,
it's not nearly as slow as it look.)

Like this:

#include <iostream>
#include <cmath>
#include <algorithm>
#include <iterator>

int main()
{
        const size_t size = sizeof(double);
        union {
                double f;
                unsigned char s[size];
        } u;
        u.f = 4.0 * std::atan(1.0); // Pi
        std::cout << u.f << '\n';
        std::copy(&u.s[0], &u.s[size],
                  std::ostream_iterator<unsigned int>(std::cout, "-"));=

        std::cout << std::endl;

}

On IA32:
3.14159
24-45-68-84-251-33-9-64-

On PA-RISC:
3.14159
64-9-33-251-84-68-45-24-

;)

On a Sun Sparc, if I try this somewhere in the middle of a
larger buffer (say at the second byte), I get a core dump:-).

AFAIK, the union guarantees that alignment is correct for all
of its members. The idea was to allocate the union on the
stack an copy from/to there.

OK. Exactly as you've written it, there's no problem. I
thought you were thinking more along the lines of artificially
placing the union over the buffer. (I've seen more than a few
programmers who try to do that.) When you define a variable
with a union type, of course, the compiler must ensure alignment
(or rather, it must ensure that you can access all of the
elements of the union without a core dump).

But since it's undefined behaviour, the discussion is moot...

Unless you're more concerned about what compilers actually do
that about what the standard says:-).

(Correctly aligned, or using memcpy, the results are the
same as those of the PA-RISC.) Try it on just about any
mainframe, and you'll get still other values.

And since they use EBCDIC you can't simply use text
representation. Though I guess most sensible transport layers
can transform EBCDIC to ASCII on the fly.

I've often wondered a bit about this myself. Usually, as you
say, code translation takes place during file transfer. But
what happens on shared disks. But do mainframes support
arbitrary disk sharing, say mounting a file system served by a
Unix machine? Somehow, I rather doubt it.

Note that some mainframes don't use 2's complement for integral
types either. (Unisys has two mainframe architectures. One is
36 bit 1's complement, the other 48 bit signed magnitude, with,
however, only 39 value bits in the 48, and no unsigned
arithmetic, so INT_MAX == UINT_MAX. I can imagine that more
than a few "portable" programs would have problems with one of
those.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34