Re: Array Size
 
On Wednesday, 3 July 2013 13:05:27 UTC+1, Gerhard Fiedler  wrote:
James Kanze wrote:
On Monday, 1 July 2013 19:02:45 UTC+1, Gerhard Fiedler  wrote:
James Kanze wrote:
[...] I will type-pun the floating point to
a uint32_t or uint64_t, [...]
What is a guaranteed (i.e. standard-conform) way to type-pun?
There's always memcpy.  
So memcpy of trivially copyable types is guaranteed to work?
It's in the example in =A73.9/2.  Which says:
    For any object (other than a base-class subobject) of
    trivially copyable type T, whether or not the object holds
    a valid value of type T, the underlying bytes (1.7) making
    up the object can be copied into an array of char or
    unsigned char.40 If the content of the array of char or
    unsigned char is copied back into the object, the object
    shall subsequently hold its original value.
There's also the wording in =A73.10/12:
    If a program attempts to access the stored value of an
    object through a glvalue of other than one of the following
    types the behavior is undefined[52]:
    [...]
    =97 a char or unsigned char type.
    [52] The intent of this list is to specify those
        circumstances in which an object may or may not be
        aliased.
Roughly speaking, if I have:
    void f( float* p1, double* p2 );
the compiler is free to assume that p1 and p2 do not alias when
compiling f.  If one of the pointers is a char* or an unsigned
char*, however, it isn't.
The compiler is smart enough to realize that I changed an
object "through the back door"?
It's smart enough to see that there were some conversions to
void* along the way, and suppose the worst.
If so, why is memcpy safe, but compilers may generate
wrong results when using reinterpret_cast (because they wrongly assume
that a given object didn't change)?
They can't generate wrong results if the reinterpret_cast is to
char* or unsigned char*.
The whole issue is complex.  Most compilers claim to allow using
a union for type punning, but in practice, only if they see the
union.  And there is (or was, but I think it's still there)
a "bug" in g++, where the following code failed:
    int f( int* p1, float* p2 )
    {
        int results = *p1;
        *p2 = 3.14159;
    }
    //  called, from a different translation unit.
    union U { int i; float f; };
    U u;
    u.i = 42;
    std::cout << f( &u.i, &u.f );
Technically, the standard says that this is well formed and
legal, since there is nowhere where the abstract machine reads
anything but the last written member.  Practically, I suspect
that most compilers doing optimization based on the lack of
aliasing will have the same problem: an obvious optimization is
to "rewrite" f as:
    int f( int* p1, float* p2 )
    {
        *p2 = 3.14159;
        return *p1;
    }
Perfectly legal in the absence of aliasing.
At some point, someone is going to have to sit down and define
very exactly what is and is not guaranteed.  The optimizations
that can be performed when the compiler can determine that there
is no aliasing are too important.  In the meantime, from a QoI
point of view: it seems reasonable to expect type punning to
work *if* the union or reinterpret_cast is visible within the
function being translated.  The standard doesn't guarantee it,
but if the compiler can actually see the aliasing, it can easily
take it into account.  And it seems reasonable to not expect
them to work if the union or reinterpret_cast is not visible in
the function.  Even in cases like the example above, where the
standard says that the code is legal and well defined.
But in this case, the code is fairly machine dependent anyway---not
just with regards to the integer sizes, but also the floating point
representation (and machines which don't use IEEE are still
widespread).  The *intent* of the standard is for reinterpret_cast to
work here (and I've used it successfully in the past); 
The thing with reinterpret_cast is type punning. Am I right in thinking
that if one of the involved types is a char*, there is no problem with
type punning?
Correct, but I was type punning between a double and a uint64_t.
I don't have the code any more, so I don't know the exact
circumstances, but it did work, even with g++.  (I think, but
I'm not sure, that there was never more than one reference to
the object in any single function.  That is, I had something
like:
    void
    input( std::istream& source, double& d )
    {
        input( source, reinterpret_cast<uint64_t&>( d ) );
    }
the general practice is for compilers to support a union.
Sometimes difficult to find in the compiler docs whether that's
supported. Anybody any pointers for MSVC and GCC?
There are some comments in
http://gcc.gnu.org/gcc-4.4/porting_to.html, which say "use
a union", and there's also the section "Casting does not work as
expected when optimization is turned on" in
http://gcc.gnu.org/bugs/, which finishes with "you can use
a union instead of a cast" (although in the code in question:
the union is just as much undefined behavior as the cast).
I seem to recall having seen something similar for MSVC, but
I can't remember where.
-- 
James