Re: Array irregularites in C++

From:

"Vidar Hasfjord" <vattilah-groups@yahoo.co.uk>

Newsgroups:

comp.lang.c++.moderated

Date:

9 Dec 2006 22:18:00 -0500

Message-ID:

<1165703553.209611.249000@f1g2000cwa.googlegroups.com>

Seungbeom Kim wrote:

Except the string literals. :)

True. It's a good example of another irregularity though --- you can
have one type of array literal, but not another.

Another peculiar irregularity of string literals is that
zero-termination is built-in. That is (sizeof "Hello") is 6 since a
'\0' is appended. In my view zero-termination is a design-choice better
left to the programmer (or standard library) than imposed by the core
language. It is perfectly reasonable to work with strings (or more
generally, arrays) with no termination-element. In particular a static
array/literal knows its size because it is part of the type. For
example (hypothetical):

   typedef char AccessCode [4]; // sizeof AccessCode == 4.
   void foo (AccessCode a) { // No type decay - by value.
     AccessCode b = a; // Copies (char [4]) object.
     //...
   }
   AccessCode c = "42"; // Error - wrong size.
   AccessCode d = "*42*"; // OK - copy initialization.
   foo (d); // OK - pass by value.
   foo ("42"); // Error - wrong size.
   foo ("*42*"); // OK - pass by value.

For a program that works with a large number of static strings
zero-termination is wasteful. For example, the following is currently
illegal:

   typedef char TypeTag [4];
   TypeTag types [] = {"INTR", "CHAR", "ACOD", "PSWD", ...};
   // error C2117: 'types' : array bounds overflow

You have to do:

   TypeTag types [] = {{'I','N','T','R'}, ...};

All of this of course only applies to strings of static size. But in my
view they are the building blocks. Where dynamic size is needed you
pass pointer-and-size; preferably encapsulated in an abstraction such
as std::string.

Can you explain what you mean by "static" and "dynamic" array types?

Although "static" has many overloaded meanings in C++, I've grown used
to the meaning "determined at compile-time". Conversely, I use
"dynamic" in the meaning "not known until run-time". I apply this
terminology to arrays as well any other language construct.

I find this a very useful terminology, especially in the context of the
generic programming and meta-programming developments in C++ over the
last decade. The established view now is that the template feature is a
functional programming language in itself, working in the compile-time
domain (the static realm), and computing types and constants. More
support for static computations are proposed for C++0x, esp. constant
folding.

But I digress. I should probably be more careful with terminology with
regard to arrays. By static array I mean an array type that has static
size (using the meaning of static above). A literal is both a static
array and a *constant* --- the latter meaning its elements are also
static, i.e. determined at compile-time.

Whenever you cross the static/dynamic (compile-time/run-time) boundary,
you have to store/pass along the static information. For example
(hypothetically regular):

   char (*s) [5] = new char [5]; // Regular (not C++).
   char* t = new char [5]; // Error - type mismatch.
   *s = "Hello"; // Copy.
   *s = "Hello!"; // Error - type mismatch.

   // Dynamic memory allocations need size info:
   const size_t buf_size = get_size_from_config_file ();
   char* buf = malloc <char> (buf_size);

   // Ordinary function - need size info:
   str_copy (buf, buf_size, *s, sizeof *s);

   // Template overload - deduces static size info:
   str_copy (buf, buf_size, *s);

   // No need for delete [], static size is deduced:
   delete s;

   // Dynamic memory deallocations need size info:
   free <char> (buf, buf_size);

Note that a regular language would change the semantics of "new" for
array types. This would eliminate the need for "new []" and "delete
[]". But a new mechanism for allocating memory of a given size at
run-time would be needed, because "T a [n]", where "n" is a variable,
is illegal. This is "malloc" above. It could be implemented like
follows, based on the view that memory is a static array:

   template <class T>
   T* malloc (size_t n) {
     // typedef char Memory [size_t];
     size_t i = std::allocate (n * sizeof T);
     return std::ptr <T> (i);
   }

Hence, there's no need to introduce "T a [n]" (VLA) into the language.

ptr <int> pi; // int*

What benefits, other than the declaration syntax regularity,
does this give you over the plain pointers?

Now you're asking me to diverge from the thread topic. I think
regularity in itself has many virtues, many of which has been pointed
out before in this thread and elsewhere. To reiterate: Teachability,
usability, simpler compilers, easier tool making, etc. Hence, I think
it is worth consideration in itself. Here's my hypothetical example
above in nice regular syntax:

   alias S = array <char, 5>; // Convenience.
   auto s = new S; // Type deduction.
   *s = "Hello"; // Copy.

   // Dynamic allocations need size info:
   auto buf_size = get_size_from_config_file ();
   auto buf = malloc <char> (buf_size);

The rest of the example is identical. Another nicety of regular syntax
is the regular specification of type modifiers such as const:

   const ptr <int> cpi; // int* const cpi;
   ptr <const int> pci; // const int* pci; // or
                        // int const* pci;

   const ptr <volatile array <const int>> cpvaci;

I'll leave the equivalent C++ declaration for the last one as an
exercise for the reader... :-)

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]