Re: Problem with array objects

From:

"A. Bolmarcich" <aggedor@earl-grey.cloud9.net>

Newsgroups:

comp.lang.c++

Date:

Thu, 26 May 2011 14:38:34 -0500

Message-ID:

<slrnittb1q.25so.aggedor@earl-grey.cloud9.net>

On 2011-05-25, Paul <pchristor@yahoo.co.uk> wrote:

"A. Bolmarcich" <aggedor@earl-grey.cloud9.net> wrote in message
news:slrnitqfvi.10rf.aggedor@earl-grey.cloud9.net...

[snip]

When you dereference a pointer to int you access the pointed to integer
object like so :
int x=5;
int* px = &x;
std::cout<< *px;
//this will output 5 because dereferencing px accesses the object it
points
to.

With an array the situation is not the same becasue an array cannot be
accessed, as a whole. The only way we can point to an array is to point
to
one of its elements.
int arr[3] = {1,2,3};
int* parr = arr;
int (*pparr)[3] = &arr;

std::cout<< *parr;
//outputs 1 because it points to the first element of the array.
std::cout<<*pparr;
//outputs a memory address because it points to an array-type object.

The situation with the unary * and unary & operators is the same for
an array and for a non-array. The C++ standard does not specify
different behaviors depending on whether the operand of the unary *
and unary & operators is an array or non-array.

Here is the paragraph from the C++ standard about the unary *
operator.

The unary * operator performs indirection: the expression to which
it is applied shall be a pointer to an object type, or a pointer to
a function type and the result is an lvalue referring to the object
or function to which the expression points. If the type of the
expression is "pointer to T", the type of the result is "T".
[Note: a pointer to an incomplete type (other than cv void ) can be
dereferenced. The lvalue thus obtained can be used in limited ways
(to initialize a reference, for example); this lvalue must not be
converted to an rvalue, see 4.1. ]

The C++ standard does not specify different behaviors for an array
and a non-array with the unary * operator.

Here is the paragraph from the C++ standard about the unary &
operator.

The result of the unary & operator is a pointer to its operand.
The operand shall be an lvalue or a qualified-id. In the first
case, if the type of the expression is "T", the type of the result
is "pointer to T". In particular, the address of an object of type
"cv T" is "pointer to cv T", with the same cv-qualifiers. For a
qualified-id, if the member is a static member of type "T", the
type of the result is plain "pointer to T". If the member is
a nonstatic member of class C of type T, the type of the result is
"pointer to member of class C of type T." [Example:

   struct A { int i; };
   struct B : A { };
   ... &B::i ... // has type int A::*

--end example] [Note: a pointer to member formed from a mutable
nonstatic data member (7.1.1) does not reflect the mutable
specifier associated with the nonstatic data member. ]

The C++ standard does not specify different behaviors for an array
and a non-array with the unary & operator.

A difference with array and non-array results is that
array-to-pointer conversion is applied to an array result.

In your example, the statement

int* parr = arr;

implicitly applies array-to-pointer conversion to the array result of
the expression arr. The result of that conversion is a pointer to
the first element of arr, not a pointer to arr. Because parr is a
pointer to int, the result of dereferencing it is an int.

In your example, the statement

int (*pparr)[3] = &arr;

initializes pparr with a pointer to arr, not a pointer to an element
of arr. Because pparr is a pointer to an array of int, the result of
dereferencing it is an array of int. Array-to-pointer conversion is
implicitly applied to that result and the result of the conversion
is a pointer to int that points to the first element of arr.

An array identifier such as 'arr' is an array-type object. A pointer to
this
object points to a single object, not to an array of thi sobject type.

The result of using the identifier 'arr' in an expression is an
array. An array is a single object that contains sub-objects. The
expression &arr points to the object that is the array named arr.

Given the declaration

int arr[4];

a C++ implementaion creates an array object to represent the array,
but it does not also create an object that stores a pointer to the
array object, unless one is explicitly present, say due to the
declaration

int (*pparr)[4] = &arr;

Due to that statement a C++ implementation creates a pointer to
array object that is initialized to point to the array. The
pointer points directly to the array object.

The pointer pparr above points to a single object not an array of
objects.
Consider this:

int (*p)[3]=0;
std::cout<<*p<<std::endl;
std::cout<< typeid(*p).name()<<std::endl;
std::cout<< sizeof(*p);

Does the above pointer point to a valid object?
Or is it completely UB because its dereferencing a null pointer?

Having the value of a pointer be the null pointer is valid. The
effect of dereferencing the null pointer is undefined.

A few followups ago I posted the code generated by a GNU C++ compiler
to show how an array object and a pointer to an array object were
implemented. I don't know of any compiler that adds an object that
stores a pointer to the array for each array. I don't know of
anything in the C++ standard that requires an object that stores
a pointer to an array for each array. If you do, please provide
details.

An array object must store a pointer otherwise how does it know, where in
memory, the array is?

The post you have replied to up till here was not a post by me.
I believe the following is addressed toward sme.

The post that I replied to, and that is quoted above in the lines
starting with ">>>", was posted by you. If the newsreader you use
has a function to go back to the post that a reply is to, use that
function to go back 3 replys to get to the post in which the lines
above starting with ">>>" were originally written. Otherwise, open
http://groups.google.com/group/comp.lang.c++/msg/c53faaea6144e685?hl=en

In a previous post you asked: "So where does the memory address value
come from? Its not stored in the array of integer objects." My
answer was (see
http://groups.google.com/group/comp.lang.c++/msg/90a32f760cdfc958?hl=en)

Where the memory address comes from depends on where the
implementation decides to store the array. For example, an object
with automatic storage duration, such a non-static array declared
in a function, is allocated on the stack in an implementation that
uses a stack for automatic storage.

The compiler knows the compile-time constant offset in the stack
frame where it has decided to store the array. In places where a
program needs the memory address of the array, the compiler puts
in instructions to sum that compile-time constant offset and the
current value of the stack pointer.

In the last sentence, "stack pointer" should have been "stack
frame pointer".

For the program

void foo() {
   int arr[4], (*pparr)[4];

   pparr = &arr;
}

the assembler output of the GNU C++ compiler for the assignment
statement for an i686 system is

         leal -20(%ebp), %eax
         movl %eax, -4(%ebp)

This is a very tiny piece of code and the compiler is allowed to optimise
this .
Look at some asm code where an array is passed to a function and you will
see what the value pushed onto the stack is.
Here is a simple program:

void foo(int* p){ p[0]=7;}

int main(){
int arr[5]={0};
foo(arr);
}

When an array is used as an argument to a function, array-to-pointer
conversion occurs and what is passed is a pointer to the first
element of the array. In what I posted no optimizations were done.
If optimizations had been done, the body of the function foo that I
gave could have been eliminated, including the local variables of
the function.

And here is the asm output:

; Listing generated by Microsoft (R) Optimizing Compiler Version
14.00.50727.762

TITLE C:\cpp\public.cpp
.686P
.XMM
include listing.inc
.model flat

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC ?foo@@YAXPAH@Z ; foo
; Function compile flags: /Odtp
_TEXT SEGMENT
_p$ = 8 ; size = 4
?foo@@YAXPAH@Z PROC ; foo
; File c:\cpp\public.cpp
; Line 3
push ebp
mov ebp, esp
; Line 4
mov eax, DWORD PTR _p$[ebp]
mov DWORD PTR [eax], 7
; Line 5
pop ebp
ret 0
?foo@@YAXPAH@Z ENDP ; foo
_TEXT ENDS
PUBLIC _main
; Function compile flags: /Odtp
_TEXT SEGMENT
/*************************************/
_arr$ = -20 ; size = 20

/************************************/
The above line is the array type object.
This is a pointer in asm because array type objects do not exist in asm.
/************************************/

The -20 is neither an array type object nor a pointer. It is an
offset in the stack frame where the C++ compiler has allocated the
array object. If you think it is an array type object in which a
pointer to an array is stored, have the C++ program output the
value stored in that array type object and see if the result is -20.

To a C++ compiler arr is an array of 5 int. In the assembler program
generated by the C++ compiler, arr is 20 bytes located at an offset
of -20 in a stack frame. That 20 bytes is the array object; the
type of that object is array of 5 int.

_main PROC
; Line 7
push ebp
mov ebp, esp
sub esp, 20 ; 00000014H
; Line 8
mov DWORD PTR _arr$[ebp], 0
xor eax, eax
mov DWORD PTR _arr$[ebp+4], eax
mov DWORD PTR _arr$[ebp+8], eax
mov DWORD PTR _arr$[ebp+12], eax
mov DWORD PTR _arr$[ebp+16], eax
; Line 9
/**************************************/
lea ecx, DWORD PTR _arr$[ebp]
push ecx
/*************************************/
The above two lines push the address of the arrays first element onto the
stack prior to invokation of foo.
/*************************************/

That is exactly what it should happen due to the array-to-pointer
conversion done on the argument expression arr. The lea instruction
adds the offset where arr is in the stack frame to the stack frame
address in register ebp. That sum is placed where the function
being called expects its argument to be. Note that no value stored
in an object was loaded to determine the argument value.

call ?foo@@YAXPAH@Z ; foo
add esp, 4
; Line 11
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END

In the above asm listing arr is _arr$ , that is a pointer object that has
the value of -20.

_arr$ is a symbolic constant in the assembler source with a value of
-20, like

#define _arr$ -20

would be in C++, if the character $ were allowed in a preprocessor
identifier.

_arr$ is not a pointer object. An object is a region of storage where
the value of the object is stored. The object named arr is the 20
bytes region of storage starting at an offset of -20 in the stack
frame of the function in which arr is declared.

The compiler has allocated arr at offset -20 in the stack frame and
pparr at offset -4 in the stack frame. The assembler instructions
store in pparr the sum of -20 and the stack frame address.
Determining the address of arr did not use a value stored in an
object.

You example was so simple that the compiler has optimised the array object
into a temporary literal (-20).

In my example the compiler did no optimizations. -20 was the offset
in the stack frame where the compiler allocated the array of 4 int
named arr. What is a temporary literal?

Here is an example program where the compiler cannot optimize the
array object, because the compiler does not know what the bar
function does with the pointer to the array.

  void bar(int (*)[4]);
  int foo() {
    int arr[4];
    bar(&arr);
  }

The generated assembler for the call is

          leal -16(%ebp), %eax
          movl %eax, (%esp)
          call __Z3barPA4_i

Note that no value stored in an object was loaded to determine the
the address of arr, the argument passed to the function bar. There
is no array-type object whose value is the memory address of the
array.

[snip]

As shown in the asm listing _arr$ is an object with a value of -20. This is
a pointer object that points to the first element of the array, this object
stores the address of the array.

What the asm listing shows is that _arr$ is an assembler symbolic
constant with a value -20. _arr$ is not an object. _arr$ is not a
pointer. _arr$ is used to indicate to someone reading the assembler
listing that an operand like

_arr$[ebp]

refers to the variable named arr in the current stack frame. If the
literal offset of -20 were used in the operand instead, it would not
be obvious to someone reading the assembler listing what local
variable the operand corresponds to.

In C++ this object is not considered a pointer , it is an array type object.
The C++ standards refers to it as a non modifiable object of array type.

That's right, in C++ an array is not a pointer, although in some
contexts array-to-pointer conversion is done and the result of the
conversion is a pointer to the first element of the array.

What is stored in an array type object is the contents of an array.
An array type object does not store a pointer to an array, unless, of
course, the array contains array pointers, such as an array with the
declarator (*x[5])[3].