Re: Vectors vs Arrays performance
On Jan 2, 10:16 pm, "Balog Pal" <p...@lib.hu> wrote:
"James Kanze" <james.ka...@gmail.com>
That's bullshit. You have to dereference a pointer in
both cases. In the case of indexing, you have to
calculate the address (both for stack based and for
std::vector), and because these calculations are hidden
in some functions with std::vector (and probably do
bounds checking as well), it's harder for the optimizer
to get its hands on them. On the other hand, if you're
using iterators, your code with std::vector is basically
what the optimizer will do with indexing into a built-in
array.
It is not bullshit, in the case of the stack array it is
simply an offset from the current stack pointer, no
dereference required,
You can't access the value without having its address. And
the value is in memory, so accessing it involves
dereferencing the address.
Come on, the difference must be evident with any stack-using
implementation -- that is the most common.
Having actually implemented compilers for stack based
implementations, it's quite evident to me that the difference
depends, and that the extra dereference should actually be quite
rare.
For direct array on the stack the base address is already in
the stack or frame poitner, and need no extra load. For the
other, the base is at arbitrary locaton that must be loaded in
a register with an extra register.
Most of the time, on most hardware, the base pointer for the
array will also already be in a register. Unless you've got a
very bad compiler. (It's a little more difficult for Intel
architecture, because of the paucity of registers. But
Microsoft's C 1.0 managed most of the times, so it's hardly
leading edge, even on Intel.)
The assy presented shows exactly that.
The assembly shows that when you surround a single access with
inline assembler and accesses to a volatile, the compiler
doesn't optimize very well. Something I was aware of before
even looking at the assembler.
Whether that single extra load, and associated new area in
cache actually measures in an application is a different
question, the difference may get masked entirely by a stall,
but it may as well show.
It won't show, because it won't be present in any critical
paths.
Why does everyone suppose that compilers are so stupid? This
sort of optimization has been present in most compilers for well
over 20 years now.
take a look at assembler output if you do not believe me, you
should see an extra instruction for the dereference of vector
operator[] compared to the stack array []. If you are using a
pointer/reference to the array then they will be the same.
int main()
{
volatile int output;
int a[10];
std::vector<int> v(10);
_asm int 3;
output = a[0];
_asm int 3;
output = v[0];
_asm int 3;
}
What on earth is the volatile doing in there? And the _asm?
Irreelvant to the point,
Very relevant to the point, because they create an environment
in which the compiler won't optimize.
but without such tricks tMSVC would just generate
an empty return for main and remove all the code.
Probably. But ensuring that no optimization is possible isn't
very realistic either, since in real code, optimization is
possible.
Note too that the old rule: "never trust a benchmark you didn't
falsify yourself" holds here as well. If I wanted to "prove"
that an extra instruction was required, I know how to write a
"realistic" benchmark that would do it. Except that 1) it would
really only prove the point for a given compiler on a given
architecture, and 2) "realistic" is in the eyes of the beholder:
while volatile and inline assembler certainly aren't realistic,
who knows where you really have to draw the line. (FWIW: my
case would use a virtual function returning a[i] from a private
member array. In my experience, virtual functions are really
great optimization blockers for a lot of compilers---including
VC++. But is it realistic to have a virtual function which just
makes one access to a single element? That probably depends on
the application domain, but I've never seen it; whenever I'm
using an array, I'm accessing a large number of elements
locally, which means that the compiler will have hoisted the
pointer to the array data into a register. Regardless of where
that pointer comes from.)
Notice the extra instruction?
I noticed a lot of extra instructions: a volatile, and some
_asm, to begin with.
it's
mov eax, DWORD PTR _a$[esp+72]
vs.
mov ecx, DWORD PTR _v$[esp+76]
mov eax, DWORD PTR [ecx]
Yes, but that code wouldn't necessarily be generated if there
wasn't a volatile nor any inline assembler.
If there is an offset calculation, it would add to the last line
respectively, one extra fetch is mandatory.
One extra fetch is mandatory IF your code is twisted enough to
inhibit all compiler optimizations. In practice, the real code
I've seen isn't (and the architecture hasn't been Intel, which
changes things as well), and pointers to the array have always
been in a register.
[...]
TANSTAAFL, extra indirections generally comes with some cost.
That we probably accept for the benefits on the other side,
and should not deny.
Except that the extra indirection isn't alway, or even usually,
present. If you pass the array to a function, and access it
there, for example, there is no difference between an array on
the stack and an array on the heap. (There may be a difference
between a C style array and std::vector, because many compilers
systematically pass class type objects in memory, but pointers
in a register.) If you use the array several times in the same
function, the compiler will also hoist the address calculations
up to before the first use---you don't get an extra access for
each use. And so on. The only way you can compare is to
implement both versions, with real code, and time them.
--
James Kanze