Re: Fwd: Re: Useful applications for boolean increments?

From:

Ivan Godard <ivan@ootbcomp.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Fri, 2 Nov 2012 11:22:14 -0700 (PDT)

Message-ID:

<k6v10c$pag$1@dont-email.me>

On 11/1/2012 9:26 AM, Daniel Kr??gler wrote:

[I apologize for the late response, but I had some severe problems with
the configuration of my news group reader]

On 2012-10-25 20:31, Ivan Godard wrote:

  > I frequently use increment over an enumeration, typically when
  > iterating over an array whose index set is an enum. This construct
  > is not native to C/C++, but with type traits that give lower/upper
  > bounds for enum types and a little meta-programming you can write:
  > enum E(f, g, h);
  > array<int, E> a, b;
  > forEach(x, thru<E>()) {
  > a[x] = 17;
  > b[x + 1] = 23;
  > }

  > The metaprogramming ensures that a[5] is illegal. ++ and -- are
  > defined as the successor and predecessor operations in the natural
  > way, as are E ?? integral and E - E (but of course not E + E) in the
  > obvious way. That is, the set of arithmetic operations are the
  > same as for pointers.

I agree that this looks like a useful tool. Am I correctly
understanding that this is a view of an enumeration type's range of
valid values (specified by the extreme values b_min and b_max in the
standard)?

Strictly speaking it is defined by the lwb and upb values supplied to
the macro that sets up the traits for the enumeration, and could be any
value coerceable to a constexpr of the enum. In practice they are always
the extrema of the declared values of the enum's list. Neither I nor a
Google search are familiar with std::b_min/max.

The symbols b_min and b_max are defined in 7.2 [dcl.enum] (I'm using
underscore _ to indicate a subscript):

"for an enumeration where e_min is the smallest enumerator and e_max is
the largest, the values of the enumeration are the values in the range
b_min to b_max, defined as follows: Let K be 1 for a two???s complement
representation and 0 for a one???s complement or sign-magnitude
representation. b_max is the smallest value greater than or equal to
max(|e_min| ??? K, |e_max|) and equal to (2^M) ??? 1, where M is a
non-negative integer. b_min is zero if e_min is non-negative and ???(b_max
+ K) otherwise. The size of the smallest bit-field large enough to hold
all the values of the enumeration type is max(M, 1) if b_min is
zero and M + 1 otherwise. It is possible to define an enumeration that
has values not defined by any of its enumerators. If the enumerator-list
is empty, the values of the enumeration are as if the enumeration had a
single enumerator with value 0."

Yes, this defines the width. However, it *doesn't* require the compiler
to expose the values of b_min/b_max to the program, which is what I'm
looking for.

It would be very nice if these extrema and some of the other information
well known to the compiler but hidden by the language were exposed to
the user, instead of requiring manual maintenance of traits. The most
badly needed IMO is an array of strings containing the printnames of the
enumerates.

I agree that deducing this information via some "reflection" mechanism
would be useful.

There are two issues being confused here. My concern is functionality or
lack thereof. The second is the legacy of C and its lack of
functionality that would treat an enum as more than the collection of
#defines that was all C had at the beginning.

I'm not sure that C++ will really change the good old C enums more than
necessary, since you can use enum classes in C++11. For these enums the
value-range is *exactly* identical to the value-range of the underlying
type of the enum (Which again can be queried via the trait
std::underlying_type).

I do not suggest changing C enum; it is what it is and every language
has some burden of compatibility.

It's enum class that concerns me. Making the value range be the same as
for the underlying type is a mistake. The underlying type is a
representation, not a value set. It is common to see a three-valued enum
lodged by itself in a four-byte MMIO word. The value set lets the
compiler complain (usefully) if an invalid vale is assigned, while the
representation (usefully) determines the physical layout in structs and
MMIO. These are different notions, and should not be conflated.

In my posting to the C++ group I advocated extending the representation
specification to any type, and admitting value sets for numeric objects.
It is as meaningful to be able to say:
    enum class num3 : short {1,3,5};
as to say:
    enum class enum3 : short {a,c,e};
They both physically occupy two bytes (or whatever short is; don't get
me started) and have delimited explicit value sets. Consequently:
    num3 numv1 = 3; // good
    num3 numv2 = 4; // error
    num3 numv3 = enum3::a; // error
    enum3 ev1 = a; // good
    enum3 ev2 = 1; // error
and no, you be able to say:
    enum class numx : short {1 = 2, 3 = 4, 5 = 6};
even though the compiler could easily produce the mapping table :-)

I recognize that having the value set be co-extensive with that of the
underlying type means that a check at coercion is unnecessary. This is a
bug, not a feature. If the programmer is taking the time to write a
bounded type then he wants a bounded type and wants the run time to
verify that it is not out of range. A programmer who wants to avoid the
check and trust the rest of the program to be bug free can write:
    enum class bad : unsigned char {
        dummy0 = 0, a,b,c dummy255 = 255};
and any reasonable compiler will omit the meaningless check. And yes, I
realize that unsigned char is not necessarily 0..255, but I asked you
not to get me started, remember :-)

Note that *if* you are interested in traversing over the enumerators of
an enum, you seem to make special assumptions, because there is no
guarantee that they are ascending, or unique values in general. I
emphasis this, because I think that your iteration facility depends on
that guarantees. As you say above, the programmer typically defines them
as the first and last values, but I think there exists more than one
reasonable choice here. But you know that.

I do not iterate over the enumerators, I enumerate over the value set.
If a language permits anonymous enumerates as C does (and it should not
IMO) then I will iterate over the anonymous ones too. If the language
does not permit anonymous enumerates then I will iterate only over the
named enumerates. If the language permits aliased names for enumerates
(and it should not IMO) then those are visited once, not once for each
alias. Enumeration is in value set order, not textual order.

The exact same rules should apply to types for which the value set is
specified by range rather than enumeration: iteration covers the value
set, and coercion checks against the value set, at runtime if necessary.

I have strong opinions regarding the functionality, and our code goes a
long way toward making the enum concept useful via some tortuous
metaprograming and traits classes. I can and have offered the improved
functionality to the language.

The next round of the language evolution has just started, so everything
that didn't came in yet has a new chance now, when someone argues in
favour for it. If you have an existing proposal paper in your mind,
could you please send the corresponding issue number or a link to it?

I don't follow the language review process. At your suggestion I did
submit an informal complaint/proposal to the language group mailing
list. There was some idle interest, but nothing to follow up.

The present language contains legacy cow flops from that distant day.
Among other holdovers is the use of zero as a null pointer,

Agreed, and the core working group seems to be willing to reduce that
usage, see e.g.

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#903

With nullptr existing, only limited reasons exist for this idiom.

the elision
of "if (foo == 0)"

Agreed.

to "if (foo)",

I strongly disagree in regarding to the "if (foo)" idiom. There is no
evidence that anyone is willing to deprecate or to remove this from the
language. This idiom is very popular and well understood.

Oh I quite doubt that there is any clamor to deprecate this; it's just
an example of poor original design IMO, YMMV.

and the coercion rules for bool and
enums.

All of these idioms are deprecated in our code. They are removed in code
reviews and in some cases by tools or metaprogramming. Heightened
readability results: if you see "if (foo)", then you know that foo is a
boolean variable, not a pointer.

This may be true for your working group but I think that this strict
interpretation is note very widely accepted.

Agreed. Puritanism is not widely popular, especially among beginners. :-)

If you see "switch(foo)" you can be
confident that all the case labels are actually enumerates of the class
of foo and not some other enumeration or some macroed integer.

switch is different and does not perform "contextual conversion to bool"
as if or while do.

Recently the standard took some halting steps toward cleaning up the
zero-as-null issue, and introduced enum class so at least spurious name
collisions among enumerates could be avoided. Kudos. Unfortunately, we
now have yet more half measures that merely complicate compiler and mind
without being functional.

I'm not sure whether I correctly understand this: Are you criticizing
that old enums are still supported? If you prefer enum classes, why not
using them and not bothering the old ones?

No, old enums are an unfortunate legacy and should be left alone. We
recently tried to convert all of our enums to enum class but failed;
enum class in gcc4.6 is too buggy to use. We'll try again when we cut
over to the next compiler. However, even with enum class we are still
having to roll our own traits and iterators, both of which should be
native to the language. Whether these should also be supported (or even
can also be supported) for legacy enums I don't know.

I do not believe that it would be possible or desirable to eliminate the
legacy idioms you list, breaking billions of lines of code. I feel that
the only pragmatically possible solution is to introduce a new properly
designed and functionally complete structure in parallel with the legacy
facilities, in the way that enum class (scoped enums) were introduced.

So you suggest to introduce a second bool type? But given you statement
above wouldn't that have the exact same effect as you describe above as:

"Unfortunately, we now have yet more half measures that merely
complicate compiler and mind without being functional."

Yes. The whole mess should have been cleaned up back whet the bool and
enum keywords were introduced. According to apocryphal history, there
was support for making enums fully featured that got shot down by
Neanderthals who resisted anything but named integers, but I wasn't
involved.

Enum class is not so widespread today that it cannot be amended and
extended still, and it should be. It should be merged with the class
concept: constructors, inheritance and all. The syntactic sugar of a
convenient notation for creation of a set of unique value should be
retained and extended to classes in general. And the standard should
expose metadata for reflexive programming: there is no excuse for not
making print names of enumerates available, nor for not making the types
and names of set of data members available. The machinery is there in
the language to do so, and forcing the users to do it by hand is
unfriendly in the extreme.

These are a lot of things and presumably there are some different issues
involved.

Returning to bool, it might be possible to mangle the standard to make
bool a proper enumeration while leaving the legacy cow pats, but I doubt
it. Hence I suggest adding "enum class boolean" to the language with
either a conversion function or an explicit constructor for cow flop
compatibility. Making "if" also accept the new enum class should be easy
in the standard. With enum class fixed, there should not be any added
work in the compilers to handle enum boolean because it's just another
enum class and need not be special cased except for "if".

OK, this seems different as I first understood, it does look more like a
new boolean type instead of a "enforce-it-into-some-enum" bool type. I'm
not sure why these type families need to be connected. Your problems
could be solved by introducing a new type "bool class" with the
properties you describe without any relation to enums.

Yes, that's true. However, if enum were fixed then type bool becomes
merely a standard predeclared enumeration and it could be removed
wholesale from the language standard text. I'm sure the committee would
welcome being able to take things out rather than having to put more in,
and debug the insertion :-)\

I have no idea how much support your ideas get, but I think it really
needs a complete proposal to better understand the consequences of what
you are suggesting.

I know what I want to be able to use, but I am not enough a C++
specification maven to propose specific changes to the standard, nor do
I have the effort available to become one. Want to do it together?

For example, I would like to have a completely general facility
applicable to any type which creates a set of named singletons of that
type using a convenient notation, and precludes making any more values
of the type by any means. Example usage:

    struct poles {
       poles(float vv, hv) : v(vv), h(vh) {}
       float v, h;
       } {
        north(1.0, 0.0),
        south(-1.0, 0.0),
        east(0.0, 1.0),
        west(0.0, -1.0)
        };

The syntax would involve adding "singleton_list_opt" to the end of every
type constructing action, where:
    singleton_list_opt ::= <empty> | { singleton_list }
    singleton_list ::= singleton | singleton_list , singleton
    singleton ::= identifier constructor_opt
and so on. However, I lack enough familiarity to express this simple BNF
in standardese, nor to spot when a sample syntax would conflict with
other aspects of the language.

Singleton lists are an obvious first step toward folding enums into the
general type structure. The next step is to generalize the "underlying
type" idea so that representation clauses can be used with any type. Etc

Ivan

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]