Re: calling convention stdcalll and cdecl call
This is going far out in details about matters not related to the original error
you made, when stating that stdcall can't support variadic functions.
I'm trying to answer as best I can, but I think if you continue this you will
sooner or later find something unrelated to original issue that I don't know (I
don't know all).
So what have you then achieved?
* Igor Tandetnik:
"Alf P. Steinbach" <alfps@start.no> wrote in message
news:VKednQ8L0_gmEB7VnZ2dnUVZ_iydnZ2d@posted.comnet
* Igor Tandetnik:
Let's state it this way: I personally claim, looking at the details
of the two conventions, that the only advantage of stdcall over
cdecl lies in removing one machine instruction from the call site.
This is its one and only advantage over cdecl.
No, that is incorrect.
In what way is that incorrect?
You mean "ways".
Let's first consider stdcall with fixed number and types of
arguments. This point is not important, but from academic point of
view it is an extra advantage, thus, sufficient to say your statement
is not correct. The next two points, for variable number of
arguments, are of more practical significance, but you may be
unwilling to consider those points because current tools such as
Visual C++ do not implement those advantages (it is a tool
limitation).
The academic point, with fixed number and types of args: with cdecl
the function needs to return to the caller to have the stack area
used for arguments deallocated. With stdcall convention, when the
argument data is no longer needed it can freely deallocate that area
(by incrasing SP) and reuse it in calls to other functions.
So can cdecl, as long as at the end it leaves ESP as it found it:
; arguments no longer needed
add esp, <size of arguments>
call otherFunction
sub esp, <size of arguments>
I think you're right, and this was an incorrect example. Mea culpa. It would
hold for variable number of arguments, but not for fixed number of arguments.
Which
might matter when that area is large -- it's not an optimization
that I know any compiler to do, and it's only applicable in some
corner cases (large stack area, and for in practice no local
variables), but it is an optimization that is available with stdcall,
and not with cdecl.
That is incorrect. As I've shown, it is available with cdecl also.
Now let's consider stdcall with variable number of arguments and a
function that doesn't infer its other arguments from some known
argument(s). In that case, the requirements of stdcall /dictate/ that
somehow the argument stack area is passed: it is a direct logical
consequence of documented stdcall requirements.
Well, documented stdcall requirements state that it can't be used for
variadic functions in the first place, but I'll let it slide.
No you shouldn't let that slide. I'm not familiar with any such documented
requirements. There is documentation of the Visual C++ "__stdcall" keyword, that
I read as that it doesn't implement stdcall but instead cdecl calling convention
when applied to variadic function. If you read it differently then you really
have to decide, as described AB below. Because then you have self-contradiction.
I'm assuming you are talking about your modified stdcall.
I'm talking about stdcall calling convention, implemented any way that works.
:-) One way is the way I illustrated by a complete working example.
As a simple first example, consider then
void bar( ... );
void foo( ... ) { bar( someNotationForPassingOriginalArgs ); }
which includes the case of a recursive foo reusing args, and the
special case of a tail recursive foo reusing args.
I'm not sure I follow. Tail recursion is usually eliminated by simply
jmp-ing to the beginning of the function, not by mucking with stack
frames. That would work equally well in stdcall or cdecl function.
Basically, a tail recursion is rewritten as a loop - surely both stdcall
and cdecl functions can run loops.
Yes, both can handle tail recursion. The point of mentioning that was just that
also stdcall can handle tail recursion efficiently in this particular case. It
seems I must stop mentioning details in advance.
And I don't see how non-tail recursion could reuse arguments. After all,
the original call would need to preserve some state in order to continue
with its work after the recursive call. How would it do that, while
allowing the recursive call to trample on its stack frame?
With stdcall the function knows the size of the argument area on the stack.
All it needs to do is to either copy that area or reuse it, depending on what
state on the stack (local variables below) it needs to preserve or not.
The cdecl variant can't copy the area, nor reuse it, because it doesn't know the
size of the area.
What is this someNotationForPassingOriginalArgs you are talking about?
Could you elaborate? Perhaps with an illustrative assembly sequence?
Current C or C++ do not have notation for argument forwarding. C++0x will have
such notation, specific syntax for argument forwarding, but I'm not familiar
with it (IIRC there is a g++ implementation), and I don't know whether it would
applicable here, although probably it would be.
At the assembly level it's very easy, at least conceptually. E.g.
__stdcall void bar( int x, ... ) { /* do things */ }
__stdcall double foo( int x, ... )
{
double y;
char z;
y = 3.14;
bar( x, someNotationForPassingOriginalArgs ); // X
return y;
}
Let's say this 32-bit x86 code, with usual prolog
push ebp
mov ebp, esp
Let's further say the number of bytes of arguments is passed in via ECX.
Then at point X all that's needed is to push x, and copy ECX-4 bytes from
[EBP+8] to new stack frame with corresponding adjustment of ESP, and call bar.
An optimizing compiler might optimize this in various ways.
E.g., since it knows x is not modified there's no need to treat x separately, it
can just copy ECX bytes.
And if it knows the stack requirements of bar, and that the contents in the
incoming arguments area are not significant after that call, it can forego the
whole copying by just making room for bar's stack requirement, placing y and z
below, and then efficiently reuse the area.
Works simply and nicely with stdcall (whichever general convention is
used to deal with this within the constraints of stdcall convention),
whereas with cdecl would need special purpose modification -- a
mechanism like the one for stdcall, rendering the whole point of
cdecl moot -- to do it.
Again, it's not clear to me from your description exactly _what_ works
simply and nicely with stdcall, and doesn't with cdecl. Could you
demonstrate these simple and nice workings?
See above.
cdecl doesn't have enough information.
Or if it's given that information, then it just has overhead compared to
stdcall, and no advantage.
As another and equally important example, with stdcall a function
such as printf has a means of checking that it has indeed been passed
enough bytes for the stated format specification: although the printf
function is of a form such that it doesn't know whether those bytes
are the right number and types of arguments, i.e. doesn't know enough
to determine that the call is OK, it does know enough to in many
cases say the call isn't OK (as with MS' newfangled buffer length
checking string handling functions). I think I mentioned this earlier
to you, but perhaps was to other guy.
I don't remember you mentioning it to me, but I do remember mentioning
it to you in the very post you are replying to.
I concede: your modified stdcall can catch some, but not all, misuses of
printf.
Good, except it's not modified. There's no modification of stdcall requirements.
By all means. The technique you propose is different from
"traditional" __stdcall as currently implemented by VC compiler:
"not supported by tools". Are you suggesting there exists, or used
to exist, a compiler that implemented stdcall the way you describe?
If so, could you cite a reference? Otherwise, the technique is
properly characterized as "new", as in "never before implemented".
Oh, thanks. Yes, it's possible, although doubtful, that the technique
is new in the Microsoft world. However, it conforms fully to the
requirements on stdcall as documented at e.g. <url:
http://msdn.microsoft.com/en-us/library/zxk0tw93(VS.71).aspx>.
That is incorrect. First, this page states, and I quote: "the compiler
makes vararg functions __cdecl".
That "incorrect" is an invalid inference.
You need to decide:
Whether
A) You regard a variadic function declared __stdcall as having stdcall
calling convention.
In this case your argument that stdcall doesn't support variadic
function is contradicted, because you're then saying it does, and
that stdcall convention for variadic function means the same as
with cdecl.
or
B) You regard a variadic function declared __stdcall as not having
stdcall calling convention, but cdecl calling convention.
In this case your quote is completely irrelevant to what's done in the
stdcall convention.
Second, it doesn't document that the
compiler is allowed, even in some cases, to place additional information
on the stack beyond arguments themselves.
First, I think you mean "pass additional information", since a requirement to
pass that information on the stack would be stupid.
Now, if this argument was valid (which it isn't) then RVO optimization would be
prohibited for stdcall functions, as it does pass additional information.
For example, with
struct Foo{ int x; Foo(): x(42) {} };
Foo __stdcall blah() { Foo o; return o; }
int main()
{
Foo o = blah();
std::cout << o.x << std::endl;
}
in an ordinary debug build the Visual C++ compiler (which is what the
documentation refers to) adds a hidden argument, the address of 'o', in register
eax.
It does that even if blah() is just declared and defined in another file.
Any such leeway would have to
be documented so that various tools could agree on precise stack layout
(which is, after all, the purpose of a calling convention).
Hm, that's a mixture of good and bad in same sentence.
Let's take the bad first. _stdcall is a calling convention that applies to e.g.
functions like blah() above. I hope you don't disagree with that. If you do
disagree that this __stdcall function is a stdcall function then a few weeks
debate on the contexts in which __stdcall really denotes stdcall may be ahead...
When such a function has arguments or result of a type that can vary between C++
compilers or with various options even on given OS, then it cannot in general be
called, without adding in low-level shenanigans, from source code compiled with
any other compiler or with incompatible options. Thus a calling convention only
supplies interoperability to the degree that languages and their implementations
already allow that interoperability. And in particular, it does not impose a
precise stack layout, for if it did then it would, e.g., exclude most of C++.
So the "Any such leeway" is invalid (take a look at above RVO code again).
On the other hand, in order to adopt such a technique it would be most practical
if the OS vendor, Microsoft, did document their version. They could even call it
a new calling convention, whatever. Then it would be possible for other vendors
to e.g. supply stdcall printf and have it consumed by MS compiler.
On the third hand, this is talking about hypotheticals, and really assumes that
stdcall can handle variadic functions (which it can, as demonstrated).
You said your technique would support variadic functions. The
"traditional" stdcall doesn't (again, "not supported by tools"). I
assumed you would claim this fact as an improvement. However, if you
don't maintain your technique is an improvement, then we are in
agreement. It's been my point all along that the mechanism you
propose, while possible, is inferior to existing alternatives.
As I've stated I don't think it's in practice an improvement.
Great. So we _are_ in agreement after all.
On that point, but I suspect for different reasons.
However, the technique is probably not inferior to cdecl.
Well, it is in at least some respects. The call site is same size, but
the callee's epilogue is more involved for modified stdcall.
Perhaps there are other aspects where modified stdcall is better than
cdecl, but I have yet to see a convincing example of those (besides
printf guarding against stack overrun, which in my personal opinion is
not worth the bother: yours of course may differ).
That's turning things on their head. First you're denying that stdcall can
handle variadic functions, but being taken up on that, now you require me to
/convince/ you that it's more efficient. Bah.
Since cdecl
can always be used as an alternative except for e.g. the two cases
discussed above, which anyway aren't supported today, what matters in
practice is speed and size, where only measurements can tell, and
then perhaps not even in general but just for specific applications
and contexts. I think it would *probably* come out the winner on both
counts.
Since the call site is same size, and the function body is strictly
larger and more complicated, I don't see under what set of circumstances
modified stdcall can ever win such a benchmark. Definitely not by size,
which is predictable. Could you explain how modified stdcall can be
faster than cdecl, even theoretically? What combination of caching,
branch prediction or other arcane factors could possibly help it out?
Simply by having available more information it permits optimizations to be made,
because optimizations are in the end only ways to exploit available information
(e.g. cache this chunk of stack where I have my arguments). But whether such
optimizations would be made is purely hypothetical. Whether this level of
micro-optimization matters is highly doubtful. And so on. I see this line of
questioning as just a way to go off on a tangent.
I
suspect (but don't have any proof, before you ask) that existing CPUs
are carefully optimized for existing calling conventions, rather than
hypothetical ones.
Well I don't know and I don't care about the efficency; as I wrote above only
measurements can tell. If my speculation on that turns out to be wrong, or
right, so what? Why are you interested in micro-efficiency for a variadic
function, typically ineffecient anyway?
and furthermore supports more functionality and better safety
Are you talking about printf size checking case? I guess it counts as
better safety (though not by much), but more functionality? I've yet to
see an example of that.
See earlier in this article.
Your
characterization as "inferior" is however not backed up by any
argument
That is incorrect. I have mentioned it many times before: the call site
is same size, but the function epilogue is larger with modified stdcall
than it is with cdecl.
Well that's not something anyone objective and competent would use to measure
superiority or inferiority by. Anyway what does inferiority or superiority in
/any/ respect have to do with your error?
and given your question above about advantages I think you
haven't even considered safety
I must admit I haven't before you mentioned it (I guess you could have
mentioned it before resorting to insults, but whatever). Having seen
your printf example, I'm not convinced it's much of an improvement,
though I concede it is some improvement.
Agreed.
I don't really think adding safety at that low level improves safety.
Instead it might make it easier to resort to using that level instead of more
significantly safes higher levels.
[snip]
It seems to be an argument over terminology. You seem to say:
stdcall is any arrangement, no matter how complicated, where the
callee, rather than the caller, ends up cleaning the stack. I
concede: under this expanded definition of stdcall, it is possible
to have a stdcall variadic function.
How can you say that Microsoft's own definition of stdcall is an
expanded one?
What is the /unexpanded/ (in your view) definition of stdcall?
Microsoft's own definition of stdcall doesn't involve passing total
size of arguments to the callee. Your does, at least "in some
cases". In this sense yours is "expanded".
OK, but that's just a word game.
So is claiming that your "not supported by tools" stdcall is the same
calling convention as the "supported by tools" variety. Like I said,
it's an argument over terminology. Everything is relative, right?
Please provide a link to such unexpanded definition.
http://msdn.microsoft.com/en-us/library/zxk0tw93.aspx
http://msdn.microsoft.com/en-us/library/a5s9345t.aspx
http://msdn.microsoft.com/en-us/library/25687bhx.aspx
On the last page, note the diagram of the stack frame for __stdcall
function. No evidence of total size, or any other additional
information beyond function arguments themselves.
He he, ROTFL. Have you consider a register?
Yes I have. The calling convention documents the use of registers, too.
Consider __thiscall and __fastcall, shown on this same page.
He he. Nothing there about registers in general, or EAX for RVO on stdcall
function... :-) But I looketh, and somehow ended up on <url:
http://msdn.microsoft.com/en-us/library/f9t8842e(VS.71).aspx>, documenting an
[.exe]'s entry point, as specified by linker /ENTRY, to have WinMain signature...
You know, this documentation isn't worth anything: it's not quality.
It's what we have to relate to, though, and for that you have to add simple
sound judgement, not take the incompetent tech-writer's word as gold, or, in
particular, as you do, think that's something not mentioned is forbidden.
Not that concrete
examples that do not illustrate the relevant context, are relevant in
any way (disclaimer: I haven't looked at these examples, confident
that they're not relevant).
Well, given that we are talking about variadic functions, and that the
__stdcall documentation (the first cited page) says ""the compiler makes
vararg functions __cdecl", it's hard to expect an example of something
that's explicitly not supported.
By Visual C++.
As mentioned, it's just a tool issue.
It is not at all difficult to /design/ foo so that e.g. first
argument says how many arguments follow. Nor is it difficult to
provide foo with a library routine it can call in order to get
correct automatic stack cleanup prior to returning.
That's precisely the same as the compiler directive I said you would
need, the one you ridicule below.
Sorry, then I misunderstood you.
I wonder if, in the future, before calling people stupid, you might stop
and consider that perhaps you have misunderstood something they said. As
you see, it's a possiblity.
I wonder if in the future you could refrain from such insinuations?
I strive to only tell people they're being stupid when they're actually are
being stupid, and being told might help them: as I recall, in this case that
didn't apply.
In the case of your compiler directive I wrote that "At this point I think
you're into the old habit of speculating about impractical, undoable schemes
again." Instead it appears you were talking about micro-optimization, at least
if your explanation is correct. Then what you wrote, "there would have to be"
such an optimization doesn't make much sense, but so what.
But anyway, the directive's only needed for effecting a silly
micro-optimization (perhaps I shouldn't have discussed it at all,
details tend to obscure the full view).
Well, it is precisely this micro-optimization that makes
currently-existing stdcall an improvement over cdecl. If you drop it,
then my argument, about modified stdcall being worse than cdecl
code-size-wise, stands.
It doesn't exactly stand in the sense of being correct, but it's a possibility
wrt. speed, size or both. Not with respect to other things. Anyway, who cares
what the speed or size or relative merits are -- I for one absolutely don't.
We were discussing you error in stating that stdcall can't support variadic
functions.
For that, the micro-efficiency or micro-size of any solution is irrelevant.
Now that you mention safety, I think I understand the case your
mechanism is supposed to help with. You might be thinking of
something like printf("%d"). If the caller passes total size of
arguments to the callee, va_arg could be instrumented to check that
it doesn't reach beyond those arguments.
No, the mechanism is not supposed to help with that, it just emerges
as a distinct advantage. But yes, it seems that regarding what's
possible here, improved safety, we're now in agreement.
Could you give another example where safety is improved by using
modified stdcall over cdecl?
Why?
I guess it is possible
for the caller to prepare and pass a complete description of actual
argument types, and for va_arg to verify that it's used in accordance
with this description. That could be a valuable debugging aid, but
the overhead would probably be too high for production code.
Same as with MS's "safe" string functions.
Well, those don't quite go to _these_ lengths. They just take a buffer
size along with the buffer pointer: error checking is straightforward.
Many people consider them plenty fast for production code. Windows OS
source code itself reportedly uses them:
http://msdn.microsoft.com/en-us/library/ms995349.aspx
http://download.microsoft.com/download/8/6/5/8659f5ec-6eaa-4b1f-9107-3e1ec9edf39c/secure_platform.doc
(search for "string handling"). At the very least, Microsoft is
definitely pushing them for use in production code, not just for
debugging.
How much less do you think passing a buffer length, as opposed to stack area
size, is? When you write "go to _these_ length". I just wonder.
Are there other scenarios where cdecl is less safe than stdcall
(whether "traditional" or "expanded")?
Don't know. Please don't make me think. :-)
Ah. So you state a claim, but decline to back it with any argument.
Isn't that the same sin you often accuse me of?
I'm pretty sure that's an attempt at insinuation, but if you would quote the
claim you think has been made then this can be discussed.
Hmm, I wonder what one might call a person who refuses to think. Perhaps
one or more of the terms you used to describe me might fit?
I'm refusing to do your thinking for you.
What do you call a person who asks others to do his thinking for him?
Cheers, & hth.,
- Alf
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?