Re: Need to create a C lib - using C++ classes - is it possible

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 26 May 2008 01:26:51 -0700 (PDT)

Message-ID:

<043bb367-61e0-4fd7-9d61-b2923c5ada60@8g2000hse.googlegroups.com>

On May 25, 7:51 pm, "Alf P. Steinbach" <al...@start.no> wrote:

* James Kanze:

Anyway, this is a FAQ item,

<url:http://www.parashift.com/c++-faq-lite/mixing-c-and-cpp.html#faq=

-32.1>

"You must use your C++ compiler when compiling main()
(e.g., for static initialization)"

and as you note also it's stated by the Holy Standard that
static variables may be (dynamically) initialized after
entry of main(), which implicitly requires a C++ main().

Woah. There's a definite misunderstanding here. The C++
standard has a somewhat twisted explination concerning how
static variables may be initialized after entering main, but
it is in practice unimplementable, and can effectively be
ignored. Static variables are initialized before entering
main.

I agree, for different reasons, that this part of the standard
is ungood and in fact pretty meaningless. However, it's
there. And seems to survive into C++0x.

And no implementation actually tries to take advantage of it. I
don't know of any that defer initialization until after the
first statement in main.

The issue here is what that actually means. Some compilers
(including CFront) do (or did) recognize the name main, and
generate special code for the function, which called the
function which did global initialization. Conceptually, this is
still "before entering main", since it is before any statement
you write in main will be executed. But of course, it *does*
require that main be compiled with the C++ compiler in order to
ensure static initialization.

But this is an implementation constraint, not a standard
constraint. The standard doesn't really say anything about how
you link C and C++ (or even how you link C++ with other C++).
Very few implementations today have this constraint. But they
have other constraints (invoke the linker with g++, rather than
gcc, for example). The whole point is that just about anything
you try to say about this issue is implementation defined.

Even though that part of the standard is IMHO defective,
talking about "after the first statement of main" instead of
entry of main.

It's defective, because the constraints that it places on
the implementation in this case are impossible to meet. But
the "after the first statement in main" is very intentional;

No, I don't think it can be intentional.

It is.

The intent is to make dynamic linking conforming. Of course,
the wording still doesn't succeed in that, even with regards to
initialization, and there are many other places where dynamic
linking introduces undefined behavior, but as it happens, I
happened to campaign to get this statement removed, on the
grounds that it couldn't be implemented, and caused real
problems with existing code, and the reason given me as to why
it stayed was dynamic linking.

   int main()
   {
       return myCppMain();
   }

"after the first statement in main" would here mean after the
program's finished.

Or never, if nothing in the translation unit of the static
variable is ever used. Although you're right that "after the
first statement of main" really means after the first statement
of main has finished, which leads to some interesting problems
as well. The intent is very much "after you're into user
written code in main".

The original phrase (more or less) goes back to the ARM, and the
intent there was clearly to speed program start up, by not
requiring static initialisers to be executed until other code in
the module was needed, and thus, the module was paged in; at the
time, the specification of <iostream.h> required it to contain a
static variable with dynamic initialization, which meant that on
program start up, you'd get a page hit for every module which
included <iostream.h>. On the systems at the time, that could
be very noticeable.

The standard dropped the requirment concerning the static
variable; I don't know if this was intentional or through
oversight (when the previously single header was split up into
several distinct headers), but a lot of implementations (e.g.
g++, STL port) of <iostream> do define the static variable, even
if it is no longer required. And I don't notice complaints
about start up speed from them today. So maybe the issue isn't
relevant on today's machines (or maybe people concerned with
performance are simply using other compilers).

The left brace is not a statement, and empty statements in C++
have to be explicitly introduced via ";".

My reading is that the intention is that initialization can
occur after the left brace of the main function's function
body, /before/ the first statement, but not later, just as in
your CFront-example above. I can't make sense of anything
else.

I didn't say it made sense. I said that it was the intent.
Conceptually, in the CFront-example, the initialization takes
place before entering main---at least with regards to anything
that a conforming program can tell. Or if you prefer, it is
what happens when you "execute" the opening left brace of main.
A sort of extended function prefix, so to speak---instead of
just setting up the local stack frame, the compiler generates
code to call the initializers, then set up the local stack
frame. There is explicite wording in the standard to allow
this, but it is elsewhere: the fact that, unlike in C, you are
not allowed to call main from your code.

what happens before, and where, can simply not be determined
by a conforming program.

Huh?

There is no way a conforming program can determine whether the
initialization was the last thing before calling main in crt0
(or whatever the implementation calls its start-up code), or the
first thing in main (before any of your code is executed).

[snip]

Both Unix and Windows do object specific initialization when
you dynamically load an object. There's no difference in
them there. Both also have many different options with
regards to what is or is not visible in the various
"modules". The main differences are, I think, that 1) all
of the options in Windows are compile and link time---you
don't have any choices at load time,

What does this mean? What choices can be specified at load
time for *nix shared library?

Whether the global symbols in the object are available when
loading other dynamic objects or not. (Specific implementations
of dlopen may have other options, but this basic choice is
specified by Posix.)

and 2) symbols in the root are not available to
dynamically loaded objects under Windows, and are always
available to dynamically loaded objects under Unix.

I'm not sure what you mean here.

I'm not that sure of the terminology myself; by "root", I mean
the code loaded as the initial binary image, before any dynamic
objects have been loaded. When you load a dynamic object under
Unix (using dlopen), you must specify either RTLD_GLOBAL or
RTLD_LOCAL: in the first case, all of the globals in the dynamic
object become accessible to other dynamic objects, in the
second, no. But since it's the operating system which loads the
root, you can't specify anything there. Under Unix, the global
symbols in the root are available to all other dynamic objects,
as if it had been loaded specifying RTLD_GLOBAL. Under Windows,
if I understand correctly, the root is always loaded as if
RTLD_LOCAL had been specified (and the choice for other
dynamic objects is made when they are build, rather than when
they are loaded---but I'm not really that certain about anything
in the Windows world).

But anyway, Windows DLLs enjoy a good degree of decoupling
because there are two sets of symbols: symbols linked by the
ordinary language-specific linker, which are only visible
until the DLL has been created, and symbols linked by Windows
loader, which are the subset of the former set that are
explicitly exported or imported. All the rest, e.g. the DLL's
usage of some runtime library, is hidden.

That's more or less true under Unix as well, depending on the
options. Of course, the linker under Unix isn't standardized,
and Posix allows an implementation to add any number of
additional options to dlopen, so different Unix will have
different capabilities here; Solaris, at least, offers the
possibility of exporting symbols on a symbol by symbol basis,
creating groups of dynamic objects, with symbols visible within
the group, but not elsewhere, and who knows what else.

Historically, Unix dynamic linking was developed to allow the
sharing of object files, and by default, it tries to behave as
much as possible like static linking. But in practice, it works
very well for things like plugins (where you want a maximum of
isolation) as well, and today is probably used more for this and
for versioning than for pure sharing.

Other than
that, it's largely a question of which options you choose. (And
you certainly don't have per DLL dynamic storage under Windows
unless you want to. I know that the Windows applications where
I work don't have it.)

Not sure what you mean by "DLL dynamic storage", and even if I
did understand that term I suspect that I wouldn't understand
the complete sentence, ending with "unless you want to". What
I wrote about was per-thread storage, and problems with that
in the context of automatic initialization and cleanup calls
from OS.

I thought you were talking about the common complaint that you
can't free memory in a different DLL than the one it was
allocated in. Which in fact depends on how you link; the
Windows specialists here have no trouble with it, for example.

[...]

The company has a large application written in C. They're
not going to rewrite the whole thing. As subsystems get
rewritten, they're rewritten in C++. It's just good
engineering.

I'm not convinced that it is, in the sense of migration to
C++.

However, I think it could be good engineering in the sense of
using C++ as a "restricted C", i.e. a C with more strict type
checking.

No. Although that too is IMHO a good step. But a large
application will likely be organized into many sub-systems, and
in general, management prefers that when reworking one
sub-system, you not touch any other.

A C program has a much more procedural structure than proper
C++ code, and replacing parts with C++ means forcing use of
C++ in procedural, non-OO mode.

It depends on the C. Long before I'd ever heard of C++, the way
I structured C was to define a struct and a set of functions
which manipulated it, and cross my fingers that no one
manipulated the struct other than with my functions. From what
I can see in third party libraries today, this seems to be more
or less common practice.

I guess with proper insulating abstractions, like XCOM, it
could be better, but when the aim is to "migrate" to C++ I
doubt such abstractions will be in place.

You don't need XCOM. You do need to provide two interfaces, a
C++ interface (which will be used by new code, written in C++),
and a C interface which is compatible with the previous C
interface. But then, you usually have to ensure backwards
compatibility anyway.

It's difficult to understand the first sentence here, which is
seemingly a tautology. One must assume that what you mean is
that "XCOM or similar technologies do not provide any
significant advantage for ...", for what?

You have an existing interface, defined in C. You reimplement
the sub-system in C++, using classes, and defining an interface
which uses classes. You then implement the C interface,
forwarding to the new classes.

I'm not familiar with XCOM, but I don't think that this is
similar. (It's obviously a lot easier if the original
application did use some sort of isolation layer, like Corba or
XCOM. But most don't.)

However, technically it should be no big deal to write

   extern "C" int c_language_main( int, char*[] );

   int main( int argc, char* argv[] )
   {
       return c_language_main( argc, argv );
   }

and compile that top-level as C++.

Technically, no. Practically, it depends. There may be
very good reasons for not doing so.

Such as?

Such as the fact that you don't have access to the main. It's
not part of the sub-system your group is responsible for.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34