Re: calling convention stdcalll and cdecl call

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

microsoft.public.vc.language

Date:

Mon, 21 Jul 2008 09:19:13 +0200

Message-ID:

<VoqdnWvNL8zrpRnVnZ2dnUVZ_uSdnZ2d@posted.comnet>

* Igor Tandetnik:

"Alf P. Steinbach" <alfps@start.no> wrote in message
news:Ta-dnXgco-Stfh7VnZ2dnUVZ_v7inZ2d@posted.comnet

This is going far out in details about matters not related to the
original error you made, when stating that stdcall can't support
variadic functions.

Or the error you made when you declared the calling convention you
invented to be the same as existing stdcall.

I have not invented a calling convention (or to be precise, I haven't done that
or referred to that in this thread). Nor have I claimed to invent a calling
convention here. Since I haven't invented a calling convention here, I have
certainly not declared it to be the same as stdcall.

On the contrary, I have posted some working code that uses stdcall calling
convention.

Said code successfully calling a variadic function, using stdcall convention.

I'm trying to answer as best I can, but I think if you continue this
you will sooner or later find something unrelated to original issue
that I don't know (I don't know all).

So what have you then achieved?

I seem to have misplaced my crystal ball. Without it, it's hard to know
what I will or will not achieve at some undetermined point in the
future.

Seems like you're extremely, extremely afraid of admitting a simple error.

Now let's consider stdcall with variable number of arguments and a
function that doesn't infer its other arguments from some known
argument(s). In that case, the requirements of stdcall /dictate/
that somehow the argument stack area is passed: it is a direct
logical consequence of documented stdcall requirements.

Well, documented stdcall requirements state that it can't be used for
variadic functions in the first place, but I'll let it slide.

No you shouldn't let that slide.

And who gave you the authority to tell me what I should or shouldn't let
slide?

What kind of question is that?

When you make statements in a public debate you should be prepared for
respondents telling you about your mistakes.

In a sense, by debating publicly you're saying to anyone who'd care to join that
they're free to criticize your statements and inform you of errors, and I'm part
of that public -- especially since you're debating with me.

This should not be so hard to understand.

Authority arguments and whatnot are simply fallacies, and you should not use
them if you want to be taken seriously.

I'm not familiar with any such
documented requirements. There is documentation of the Visual C++
"__stdcall" keyword, that I read as that it doesn't implement stdcall
but instead cdecl calling convention when applied to variadic
function.

Right. So therefore, stdcall is not supported for variadic functions.

If you mean, it's not currently supported by Visual C++, then that is correct,
and meaningless, because a calling convention is not defined by what Visual C++
supports (it's a bit more general than that).

If you mean, it's not supported by any tool, then that may or may not be
correct, but is meaningless because you can't know that.

Anyway it's meaningless.

I'm assuming you are talking about your modified stdcall.

I'm talking about stdcall calling convention, implemented any way
that works. :-)

That's just a word game.

I think I (perhaps mistakenly) a short time ago woved to stop using possibly
offensive correct labels for such statements.

But I'm sure you can think of an appropriate correct label for something so
completely and utterly meaningless, a contemptuous brushing aside of an honest
effort to answer your question completely and technically.

The answer is true, and as precise as possible, but evidently not to your taste.

As a simple first example, consider then

void bar( ... );
void foo( ... ) { bar( someNotationForPassingOriginalArgs ); }

which includes the case of a recursive foo reusing args, and the
special case of a tail recursive foo reusing args.

I'm not sure I follow. Tail recursion is usually eliminated by simply
jmp-ing to the beginning of the function, not by mucking with stack
frames. That would work equally well in stdcall or cdecl function.
Basically, a tail recursion is rewritten as a loop - surely both
stdcall and cdecl functions can run loops.

Yes, both can handle tail recursion. The point of mentioning that was
just that also stdcall can handle tail recursion efficiently in this
particular case. It seems I must stop mentioning details in advance.

And cdecl can only handle recursion less efficiently in this particular
case?

I don't understand that question, which probably means it's meaningless or
designed to deceive.

And I don't see how non-tail recursion could reuse arguments. After
all, the original call would need to preserve some state in order to
continue with its work after the recursive call. How would it do
that, while allowing the recursive call to trample on its stack
frame?

With stdcall the function knows the size of the argument area on the
stack.
All it needs to do is to either copy that area or reuse it, depending
on what state on the stack (local variables below) it needs to
preserve or not.

Ah. I misunderstood what you meant by "reusing arguments". I thought you
meant reusing them in-place, so that the current call gives up its
allocated stack frame to a recursive call. Which would be pretty much
the same as jmp-ing to the beginning (of the same or different
function), suitable only for tail recursion or tail call.

What you seem to be proposing is a new feature in the C and/or C++
language, whereby one variadic function can call another and pass to it
the exact same set of arguments it itself received, by copying them over
down-stack, below its own locals.

No, that fanciful description of my intentions is either designed to deceive, or
a description based on complete lack of understanding.

I hope it is the latter, but I'm pretty sure it is the former, because it takes
some thinking to come up with such.

You sure weren't kidding when you said
this is going far out.

I guess I even know what someNotationForPassingOriginalArgs might look
like:

void bar(va_list vl);

Same effect, no need to copy anything, works fine with cdecl.

No, that is incorrect, in at least two ways.

First, if you don't have a named first argument you can't obtain a va_list.

Second, if the bar function is not yours to control, you can't ensure it will
take a va_list as argument. And if it is yours to control, then there's not much
point in calling it via an intermediary function.

What is this someNotationForPassingOriginalArgs you are talking
about? Could you elaborate? Perhaps with an illustrative assembly
sequence?

Current C or C++ do not have notation for argument forwarding.

Correct. The va_list parameter seems to do the job.

No, it doesn't (see above).

C++0x
will have such notation, specific syntax for argument forwarding, but
I'm not familiar with it (IIRC there is a g++ implementation), and I
don't know whether it would applicable here, although probably it
would be.

http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2002/n1385.htm
http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2002/n1377.htm

This solves an entirely different problem, having nothing to do with
variadic functions and everything to do with generic programming.

Yes and no. It has much to do with variadic functions, although not the kind
we're discussing, but it seems the notation is not applicable here after all.

I concede: your modified stdcall can catch some, but not all,
misuses of printf.

Good, except it's not modified. There's no modification of stdcall
requirements.

Word games. We'll just have to agree to disagree here.

In technical matters you need to be precise. Modification means change, and no
aspect of stdcall is changed. Mostly I think this is best described as an(y)
/implementation/ of stdcall for variadic functions.

That is incorrect. First, this page states, and I quote: "the
compiler makes vararg functions __cdecl".

That "incorrect" is an invalid inference.

This "is not" - "is too" game gets tiresome. I suggest we stop it. All
the arguments either way have been exhausted by now, and are obviously
unconvincing to the opposite side. No new arguments seem to be
forthcoming. A reasonable reader should be able to form his or her own
optinion.

Well I completely lost the context of what you're referring to now, but anyway,
in my first response to you in this thread I told you that it's a matter of tool
support, and you have found quotes to prove that wrt. Visual C++ -- but you
insist that those quotes say the opposite. Oh well.

Whether

A) You regard a variadic function declared __stdcall as having
stdcall calling convention.

I can't declare a variadic function __stdcall. Or rather, I can, but it
will use __cdecl calling convention at assembly level. It would be
pretty silly to claim that a function using __cdecl calling convention
is in fact __stdcall.

In other words, if I write

void __stdcall f(int x, ...);

or

void __cdecl f(int x, ...);

the exact same machine code will be generated by VC compiler - one
matching __cdecl calling convention. So function f() is __cdecl, just
misleadingly labeled.

http://www.geocities.com/uniart/mix/kp.htm
If upon a cage of an elephant you will see a sign reading: "buffalo", do
not believe your eyes.

The phrase loses much in translation, unfortunately.

B) You regard a variadic function declared __stdcall as not having
stdcall calling convention, but cdecl calling convention.

I choose the B pill.

In this case your quote is completely irrelevant to what's done
in the stdcall convention.

This is incorrect. The quote is relevant in that it shows that there
ain't no such thing as __stdcall calling convention for variadic
functions. When applied to a variadic function, __stdcall keyword
becomes synonymous with __cdecl.

In Visual C++, yes __stdcall is replaced with __cdecl for variadic function

In other words, Visual C++ doesn't currently support stdcall for variadic functions.

Thus any quote about Visual C++ is irrelevant to what is or is not stdcall
convention for a variadic function, because there's no such thing in Visual C++.

This is not entirely unprecedented. Keywords "class" and "typename" are
synonymous and interchangeable in template parameter list, but are
distinct elsewhere.

They're not quite interchangable in template parameter lists. When you have a
template template parameter you need to use 'class', not 'typename'.

Second, it doesn't document that the
compiler is allowed, even in some cases, to place additional
information on the stack beyond arguments themselves.

First, I think you mean "pass additional information", since a
requirement to pass that information on the stack would be stupid.

The example you showed at the very beginning did pass additional
information on the stack, not on registers. Was it stupid?

Uh, <teaspoon mode>.

A requirement is different from an example.

What's stupid for a requirement need not be stupid for an example.

For example, a red car isn't stupid as an example of a car; a requirement that
all cars must be red would be stupid.

</teaspoon mode>

Now, if this argument was valid (which it isn't) then RVO
optimization would be prohibited for stdcall functions, as it does
pass additional information.

What additional information does RVO pass? As far as I can tell, it's
done without any help or knowledge of the caller.

RVO passes a pointer to the result object storage.

For example, with

    struct Foo{ int x; Foo(): x(42) {} };

    Foo __stdcall blah() { Foo o; return o; }

    int main()
    {
        Foo o = blah();
        std::cout << o.x << std::endl;
    }

in an ordinary debug build the Visual C++ compiler (which is what the
documentation refers to) adds a hidden argument, the address of 'o',
in register eax.

This is incorrect. The hidden argument is pushed on the stack. This is
the assembly generated by the call (debug build, VC 7.1):

lea eax,[o]
push eax
call blah

And the last instruction of blah() is "ret 4" (to pop this hidden
argument off the stack).

Dang, you got me there. Passed on the stack and not in register eax. Oh dear!

Anyways, regarding your question "What additional information", that's this
pointer you have here.

I gather from this + earlier comments that you don't really understand the
assembly code level.

It does that even if blah() is just declared and defined in another
file.

Yes, when a function returns a class (a non-POD structure or a POD
structure that doesn't fit into EDX:EAX), the compiler passes an
additional argument (the first, leftmost, one). It is a pointer to an
uninitialized buffer (usually allocated on the caller's stack) large
enough to hold the instance. The callee constructs the return value into
this buffer. This works the same for all calling conventions, and is
(rather poorly) documented here:

http://msdn.microsoft.com/en-us/library/984x0h58.aspx

I can't find any documentation of it there, so possibly wrong URL.

But anyway, you didn't understand this completely enough to avoid asking "What
additional information" above.

Are you by any chance George? Or, really, is George you?

However, I don't understand what any of this has to do with return-value
optimization (RVO). RVO is purely a callee's implementation detail.

No, it certainly isn't. :-)

RVO requires the passing in of a pointer to the caller's storage where the
result object should be created.

This has to be done by every caller, and that's why a pointer is passed in.

In
your example, this would be the difference between

// No RVO
void blah(void* returnGoesHere) {
    Foo o;
    new(returnGoesHere) Foo(o);
}

and

// With RVO
void blah(void* returnGoesHere) {
    // Temporary on the stack is elided, instance constructed
    // directly into caller-provided buffer.
    new(returnGoesHere) Foo();
}

The details don't change in the slightest whether blah is stdcall or
cdecl.

No, but hey, are you sure you're not George?

Anyways, (OK this may be different to grok if you don't understand the pointer
argument of RVO, but)

with RVO you have a kind of routine, namely function returning class type
result, that with stdcall and possibly other calling conventions requires each
caller to supply a hidden argument, namely pointer to caller's storage for
result, and

with variadic function you have a kind of routine, namely variadic routine, that
with stdcall and possibly other calling conventions requires each caller to
supply a hidden argument, namely the number of argument bytes.

Note how under stdcall these two kinds of routines impose the same kind of
requirement on callers, that a hidden argument must be passed, and that that's
very compiler (and even option!) dependent.

For one of these kinds of routines you have argued that that hidden argument had
to be part of the stdcall calling convention, that it would not do if two
different compilers or, god help us, compiler option sets, did it differently.

For the other kind, if you're consistent, you'll have to argue the same.

Any such leeway would have to
be documented so that various tools could agree on precise stack
layout (which is, after all, the purpose of a calling convention).

Hm, that's a mixture of good and bad in same sentence.

Let's take the bad first. _stdcall is a calling convention that
applies to e.g. functions like blah() above. I hope you don't
disagree with that.

Of course not. It's a fixed-signature function marked __stdcall.

When such a function has arguments or result of a type that can vary
between C++ compilers or with various options even on given OS, then
it cannot in general be called, without adding in low-level
shenanigans, from source code compiled with any other compiler

Correct. Various tools have to agree on several things, calling
convention being just one of them.

with incompatible options. Thus a calling convention only supplies
interoperability to the degree that languages and their
implementations already allow that interoperability. And in
particular, it does not impose a precise stack layout, for if it did
then it would, e.g., exclude most of C++.

Calling convention does specify precise stack layout, once other things
are also agreed upon. How else would you be able to write functions in
assembly and consume them in C++?

So the "Any such leeway" is invalid (take a look at above RVO code
again).

You keep using this word, but it doesn't mean what you think it means
(unless, of course, you are about to redefine it from its conventional
meaning to suit your argument).

Earlier in the article I'm responding to you asked about what additional
information was passed for the RVO case.

That you don't understand that means you don't understand anything about RVO.

RVO has nothing, I repeat nothing, to do
with calling conventions, and is transparent to the caller.

Given that you had to ask, perhaps you should think a little harder about this.

:-)

On the other hand, in order to adopt such a technique it would be
most practical if the OS vendor, Microsoft, did document their
version.

They did already. I guess they are happy with the way it is.

No, I'm sorry, as far as I know that's incorrect again: as far as I know
Microsoft has not adopted such a technique.

They could even call it a new calling convention, whatever.

I suspect before they woud consider doing that, it would have to be
demonstrated that it would benefit them, and/or their customers,
sufficiently enough to be worth the effort. I don't think this burden
has been met yet.

Of course not.

I'm sorry but you're making meaningless noise.

On the third hand, this is talking about hypotheticals, and really
assumes that stdcall can handle variadic functions (which it can, as
demonstrated).

You mean, something you insist on calling stdcall but isn't, can handle
variadic functions. Sorry, couldn't resist this one last time.

What other criterion can one have for stdcall than that all the listed
criterions are fulfilled?

However, the technique is probably not inferior to cdecl.

Well, it is in at least some respects. The call site is same size,
but the callee's epilogue is more involved for modified stdcall.

Perhaps there are other aspects where modified stdcall is better than
cdecl, but I have yet to see a convincing example of those (besides
printf guarding against stack overrun, which in my personal opinion
is not worth the bother: yours of course may differ).

That's turning things on their head. First you're denying that
stdcall can handle variadic functions, but being taken up on that,
now you require me to /convince/ you that it's more efficient. Bah.

Your misunderstanding stems from conflating traditional and modified
stdcalls - claiming that they are one and the same. While I am careful
to distinghuish between the two.

Sorry, it's not a misunderstanding: the length of this thread is excessive
evidence of the excessive lengths you go to to avoid admitting an error.

Stdcall can't handle variadic functions.

You can't be that stupid.

Your modified stdcall (or
whatever you want to call it) can. I state that it does so less
efficiently than cdecl does, and so there's no reason for tool authors
to add modified stdcall to their implementation.

Whatever. It's a valid implementation of stdcall, fulfilling all stdcall
requirements, and callable via stdcall convention, actually called and working.
So regarding your statement above, again, I simply refuse to believe you are so
stupid as to deny that this code fulfills all stdcall criterions.

Since cdecl
can always be used as an alternative except for e.g. the two cases
discussed above, which anyway aren't supported today, what matters
in practice is speed and size, where only measurements can tell, and
then perhaps not even in general but just for specific applications
and contexts. I think it would *probably* come out the winner on
both counts.

Since the call site is same size, and the function body is strictly
larger and more complicated, I don't see under what set of
circumstances modified stdcall can ever win such a benchmark.
Definitely not by size, which is predictable. Could you explain how
modified stdcall can be faster than cdecl, even theoretically? What
combination of caching, branch prediction or other arcane factors
could possibly help it out?

Simply by having available more information it permits optimizations
to be made, because optimizations are in the end only ways to exploit
available information

So far, the only optimization you showed affects a language feature that
doesn't even exist. First you introduce a pessimization - the function
has to copy over its argument list - and then say that an optimizing
compiler may, in some cases, perhaps optimize this copy away.

You're very confused here, but considering that you don't understand assembly
level that's understandable.

The optimization you're referring to has nothing to do with optimization
permitted by calling convention relative to other calling conventions. It's
purely an internal thing for /additional functionality/ enabled by the calling
convention. When comparing performance to other calling conventions you have to
compare like functionality, not functionality that's only available with one.

All the
while, there's an existing mechanism that achieves the same goal and
doesn't require any copy (witness vprintf).

See above, it isn't and doesn't.

You're seriously confused.

Color me unimpressed.

It's not surprising that you're not impressed by something that's not meant to
impress.

I resent your insinuation that this technique was presented in order to impress.

It was presented solely as a very hands-on argument that you made an error, and
you're seemingly in denial about that, choosing instead insinuations etc.

I
suspect (but don't have any proof, before you ask) that existing CPUs
are carefully optimized for existing calling conventions, rather than
hypothetical ones.

Well I don't know and I don't care about the efficency; as I wrote
above only measurements can tell. If my speculation on that turns out
to be wrong, or right, so what? Why are you interested in
micro-efficiency for a variadic function, typically ineffecient
anyway?

I'm not. I'm perfectly OK with variadic functions continuing to use
cdecl, and fixed-signature functions continuing to use stdcall (or
cdecl, for that matter).

Good. Then your earlier strong interest was just an infatuation, say. Or
perhaps a very strong desire to learn about basics.

I'm only saying that if you come up with a new idea, and suggest other
people do work based on this idea (as in, for Microsoft and perhaps
other compiler vendors to support modified stdcall in their tools), then
it seems to be incumbent on you to demonstrate that the idea produces
some improvement over existing state of affairs. So far, I have seen
some downside, however minor, and not much in the way of an upside -
probably not enough to tip the scales towards "let's do work" decision
(the scales, as usual, start heavily skewed towards "don't do work").

Your
characterization as "inferior" is however not backed up by any
argument

That is incorrect. I have mentioned it many times before: the call
site is same size, but the function epilogue is larger with modified
stdcall than it is with cdecl.

Well that's not something anyone objective and competent would use to
measure superiority or inferiority by.

Why? It makes my compiled program larger - that sounds like "inferior"
to me. What do I get in return?

I'm not counting bytes in the range of tens or twenty.

It's interesting that you do.

[snipped rest of babble, out of time]

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?