Re: Array irregularites in C++
Seungbeom Kim wrote:
Except the string literals. :)
True. It's a good example of another irregularity though --- you can
have one type of array literal, but not another.
Another peculiar irregularity of string literals is that
zero-termination is built-in. That is (sizeof "Hello") is 6 since a
'\0' is appended. In my view zero-termination is a design-choice better
left to the programmer (or standard library) than imposed by the core
language. It is perfectly reasonable to work with strings (or more
generally, arrays) with no termination-element. In particular a static
array/literal knows its size because it is part of the type. For
example (hypothetical):
typedef char AccessCode [4]; // sizeof AccessCode == 4.
void foo (AccessCode a) { // No type decay - by value.
AccessCode b = a; // Copies (char [4]) object.
//...
}
AccessCode c = "42"; // Error - wrong size.
AccessCode d = "*42*"; // OK - copy initialization.
foo (d); // OK - pass by value.
foo ("42"); // Error - wrong size.
foo ("*42*"); // OK - pass by value.
For a program that works with a large number of static strings
zero-termination is wasteful. For example, the following is currently
illegal:
typedef char TypeTag [4];
TypeTag types [] = {"INTR", "CHAR", "ACOD", "PSWD", ...};
// error C2117: 'types' : array bounds overflow
You have to do:
TypeTag types [] = {{'I','N','T','R'}, ...};
All of this of course only applies to strings of static size. But in my
view they are the building blocks. Where dynamic size is needed you
pass pointer-and-size; preferably encapsulated in an abstraction such
as std::string.
Can you explain what you mean by "static" and "dynamic" array types?
Although "static" has many overloaded meanings in C++, I've grown used
to the meaning "determined at compile-time". Conversely, I use
"dynamic" in the meaning "not known until run-time". I apply this
terminology to arrays as well any other language construct.
I find this a very useful terminology, especially in the context of the
generic programming and meta-programming developments in C++ over the
last decade. The established view now is that the template feature is a
functional programming language in itself, working in the compile-time
domain (the static realm), and computing types and constants. More
support for static computations are proposed for C++0x, esp. constant
folding.
But I digress. I should probably be more careful with terminology with
regard to arrays. By static array I mean an array type that has static
size (using the meaning of static above). A literal is both a static
array and a *constant* --- the latter meaning its elements are also
static, i.e. determined at compile-time.
Whenever you cross the static/dynamic (compile-time/run-time) boundary,
you have to store/pass along the static information. For example
(hypothetically regular):
char (*s) [5] = new char [5]; // Regular (not C++).
char* t = new char [5]; // Error - type mismatch.
*s = "Hello"; // Copy.
*s = "Hello!"; // Error - type mismatch.
// Dynamic memory allocations need size info:
const size_t buf_size = get_size_from_config_file ();
char* buf = malloc <char> (buf_size);
// Ordinary function - need size info:
str_copy (buf, buf_size, *s, sizeof *s);
// Template overload - deduces static size info:
str_copy (buf, buf_size, *s);
// No need for delete [], static size is deduced:
delete s;
// Dynamic memory deallocations need size info:
free <char> (buf, buf_size);
Note that a regular language would change the semantics of "new" for
array types. This would eliminate the need for "new []" and "delete
[]". But a new mechanism for allocating memory of a given size at
run-time would be needed, because "T a [n]", where "n" is a variable,
is illegal. This is "malloc" above. It could be implemented like
follows, based on the view that memory is a static array:
template <class T>
T* malloc (size_t n) {
// typedef char Memory [size_t];
size_t i = std::allocate (n * sizeof T);
return std::ptr <T> (i);
}
Hence, there's no need to introduce "T a [n]" (VLA) into the language.
ptr <int> pi; // int*
What benefits, other than the declaration syntax regularity,
does this give you over the plain pointers?
Now you're asking me to diverge from the thread topic. I think
regularity in itself has many virtues, many of which has been pointed
out before in this thread and elsewhere. To reiterate: Teachability,
usability, simpler compilers, easier tool making, etc. Hence, I think
it is worth consideration in itself. Here's my hypothetical example
above in nice regular syntax:
alias S = array <char, 5>; // Convenience.
auto s = new S; // Type deduction.
*s = "Hello"; // Copy.
// Dynamic allocations need size info:
auto buf_size = get_size_from_config_file ();
auto buf = malloc <char> (buf_size);
The rest of the example is identical. Another nicety of regular syntax
is the regular specification of type modifiers such as const:
const ptr <int> cpi; // int* const cpi;
ptr <const int> pci; // const int* pci; // or
// int const* pci;
const ptr <volatile array <const int>> cpvaci;
I'll leave the equivalent C++ declaration for the last one as an
exercise for the reader... :-)
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]