Re: Proposal to allow unions of any data type
skaller wrote:
Here is the outline of a proposal to remove the restriction
on declaring unions of constructable variants.
A restriction is not necessarily an impediment. So removing a
restriction in a programming language is not necessarily making an
improvement to that language. To qualify as an improvement in this
case, it would be necessary to explain how unions with constructible
members would be useful. Unions in general have some notable
shortcomings, so expanding the realm of unions would seem to be making
those shortcomings more prevalent than ever. For example, a union
provides no protection against a program storing a value in one member
and reading it from another. And should a program should do so, it
leaves the realm of defined behavior behind it for good. And what is
the benefit that a union provides to a program, that makes such a risk
even worthwhile to that program?
The answer turns out to be: "none" - "none" meaning not only is there
no benefit from a union to justify the risk of its misuse - but a union
actually provides no unique benefit at all to a C++ program (see
below).
This proposal is not worded in Standardese, it's just an outline
in plain language. Motivation is also omitted (separate issue :)
In some cases what I recall the Standard says may be wrong so
any corrections are of course appreciated.
PROPOSAL (A):
* remove the restriction on declaring
unions with variants of constructable types
* add a rule which say the compiler will not
generate a default constructor, copy constructor,
copy assignment operator, or destructor for
a union with a constructable type
..
PROBLEM: the current C++ standard handles
the initialisation of a union using ctor-initialisers
poorly. This is an outstanding defect:
union X {
int a;
long b;
X(): a(1), b(2) {}
};
Here, a is set to 1, then b is set to 2.
Clearly this is absurd, however it is merely
a gratuitous stupidity rather than being a
problem semantically. If the types are
constructable, however, this rule would be
entirely untenable:
union X {
string a;
vector b;
X(): a("") {}
};
In this case, a is NOT "" after default initialisation,
rather, b is a zero length vector .. and b clobbers
the value of a. This is because the standard
requires ALL the members to be initialised.
In the POD case, the default constructors are
trivial and don't do anything, so only explicit
initialisation have any effect.
What about assignment to the data member "b" after the constructor
initializes of "a" only? Wouldn't "a" be clobbered in the same way?
PROPOSAL (B):
* fix the rule for initialisation of
union component by constructors
to require either no ctor initialiser,
or exactly one ctor initialiser.
To return to the central question: why is the union even needed to
represent an object of variable value types? What advantages would a
union with constructible types provide that a program written in C++
could not otherwise attain using existing language facilities? About
the only plausible answer is: efficent use of memory. But that answer
is inadequate because unions actually make poor use of memory. A union
can in fact easily waste a lot of memory because the size of a union
has to be large enough to accomodate its largest member, no matter
which member actually stores the value. So whenever a value is assigned
to a union member smaller than the union's largest member - memory is
wasted because the extra bytes allocated go unused. And given a wide
variation in the size of its members and a large number of union
objects allocated, the amount of memory a union wastes can be
substantial.
A C++ program can implement a variant class to perform the same role as
the union, but one that would use memory far more efficiently than an
equivalent union. As an example, suppose a program needs to fill a
container with a combination of strings or std::vector<int>'s. The
first step would be to declare a base class with the common interface:
#include <vector>
#include <string>
#include <stdexcept>
// a variant class capable of storing a std::string
// or a std::vector<int>
class MultiItem
{
public:
virtual ~MultiItem() {}
virtual
std::vector<int>& GetVector()
{
// the invalid argument is the implicit
// "this" parameter
throw std::invalid_argument("No vector value");
}
virtual
std::string& GetString()
{
throw std::invalid_argument("No string value");
}
};
Next, declare two subclasses with a data member of the appropriate
type:
class StringItem : public MultiItem
{
public:
String(const std::string& s) : mString(s) {}
virtual
std::string& GetString()
{
return mString;
}
private:
std::string mString;
};
class VectorItem : public MultiItem
{
public:
String(const std::vector<int>& v) : mVector(v) {}
virtual
std::vector<int>& GetVector()
{
return mVector;
}
private:
std::vector<int> mVector;
};
Just as with a union, the C++ class MultiItem allows the program a
choice to access its value either as a std::string or as a std::vector.
In both cases the mechanism by which the program "knows" which member
is valid has not been specified and is assumed to be the same in either
case. So far, the union and C++ class solution are more or less
comparable.
Turning now to memory use and type safety, a clear winner emerges. On
both counts, the hypothetical union solution falls well short of the
C++ class. The union allocates the same amount of memory for each
instance no matter the size of the stored avlue, whereas the C++ class
allocates only as much memory as is needed to store its value. In
addition to being more memory efficient, the C++ class is notably safer
than the union as well. Whereas accessing the "wrong" member of a union
is not a defined operation, the equivalent operation is defined for the
C++ class - and is defined by the program itself (in this example, the
program elects to throw an exception). Being leaner and safer, the C++
class implementation really leaves the hypothetical union
implementation with little left to recommend it.
Greg
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]