Re: StateFull vs Stateless Singleton

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 7 Jul 2010 02:15:07 -0700 (PDT)

Message-ID:

<513cb353-eefd-40db-856d-06a17cb663d3@j4g2000yqh.googlegroups.com>

On Jul 6, 11:32 pm, =D6=F6 Tiib <oot...@hot.ee> wrote:

On 6 juuli, 16:43, James Kanze <james.ka...@gmail.com> wrote:>
On Jul 6, 1:21 am, =D6=F6 Tiib <oot...@hot.ee> wrote:

On 5 juuli, 21:06, James Kanze <james.ka...@gmail.com> wrote:

On Jul 5, 6:05 pm, =D6=F6 Tiib <oot...@hot.ee> wrote:

[...]

If there is some user command to change them later, then
they aren't command line options. Long running programs
don't normally use command line options (other than the
specify the configuration files); they use configuration
files, and if the user can issue commands to modify the
configuration on the fly, they probably rewrite the
configuration file as well. Command line options are for
short running programs, which do one thing and exit (like
a compiler). My options are variables at namespace scope.
Preferrably local to the module in which they are used. The
constructor calls a function in CommandLine which enrols
them.

While convenient at some levels, this solution isn't
perfect. You've still got to find all of those options when
it comes to writing the man page or help text. But
globally, I've found it very useful to be able to define an
option in the module which uses it, without having to modify
any central code which "knows" all of the options.

Ok, i have used a bit different way. Each option is mapped to command
(function or functor). As options are parsed the commands are
executed. That results with command line being parsed, passed around
and forgotten immediately. Same commands can be executed later by
other means for some reason an option has to be changed. At same place
where commands are there may be little help texts and hints if needed.
At simplest case it is static constant array in some nameless
namespace.

That's more or less what I do as well: each option is mapped to
a command. But practically all commands have state, since
options control the later execution of the program. And there
are some more or less standard cases: BooleanOption,
NumericOption, StringOption, etc.

The state is necessary because exploiting the command line is
really a two phase operation: first you parse out the options,
then you treat what is left as an array of filenames. (Most of
the time. My CommandLine class doesn't enforce this---once the
options have been parsed out, it looks very much like an
std::vector<std::string>.)

Probably it is because i like somewhat extended interface for command
line application ...
If some mandatory option was missing from command line then program
may ask for it (instead of telling it was error).

That's fairly easy to implement on top of my interface. In my
code, if some mandatory option is missing, I'll abort with an
error message, but the same code which does this could ask for
it instead. (Since the command line programs I write are used
more often in scripts than directly, going interactive isn't an
option.)

When no options were given at all it may enter into
interactive mode (instead of telling about typical usage with
few lines). Such twists are not too uncommon and make testing
it simpler.

Again, going interactive sort of defeats the purpose of using
a command line driven program: you don't want to go interactive
in the middle of a script. But it shouldn't be that difficult
to implement: just check if argc is 1 before calling
CommandLine::parse, and if it is, do something else.

Ok, there is just one command line and pretending there may be
several is pointless when there are no other sources of
options.

Even when much of the configuration may come from
a configuration file, it's useful to access it through a single
co-ordinator. (I have, or maybe only had, code somewhere which
does this. It's a singleton which maintains a list of
configuration sources; when you ask it for a value, it goes
through the list, returning the first it finds. First in the
list is typically a collection of options, registered with the
command line, followed by an object which reads environment
variables, followed by one or more files, followed by an object
which returns the defaults.)

It is difficult to achieve if there are components written by third
parties. It is bit easier to equip them with an interface how to add
command line commands.

If the components are obtained already written, they won't
conform to any system you define. If the third party is writing
them on contract for you: my "interface" as to how to add
a command line option is to derive from Option, and declare
a static instance of the derived class (or just use one of the
pre-defined classes, for the frequent cases like just setting
a boolean variable, or getting the name of a file as a string).

If keeping handling configuration files must be
centralized then i prefer to provide a simplest interface:

namespace configuration
{
void read( ModuleID id, vector<char>& bytes );
void write( ModuleID id, vector<char> const& bytes );
}

This begs the question. A configuration file needs an internal
representation to contain its data. You have to define this
internal representation somewhere. It needs some sort of means
of accessing the data: the most widespread format uses a two
level access, so you end up with a more or less complicated
interface where you have to specify both a section and a name to
access a value. (Why two levels, I don't know: for a lot of
simple applications, one level is largely sufficient, and for
larger applications, you may want more than two levels. Which
can be easily simulated by a naming convention, but you might
want to structure the file itself more.)

Simplest interfaces are simplest to extend. I may handle it as
ini or xml file, for me it is set of bytes i preserve in exact
form.

It can't stay that way forever. At some point, you need to
extract the actual information.

When all modules are mine then of course it is cheaper
to have some extended interface so each configurable value can
retrieved from common tree (or its branch).

In a multithreaded application, the thread manager (which
ensures e.g. a clean shutdown) must be a singleton.

At the moment yes, application (or some framework that it
uses) has to orchestrate shutdown itself. Again, when i do not
get rid of single things then i do not like
"ShutDownManager::instance().doIt()". I prefer to have
shutDown() function. If nothing else then faking it for a unit
test is lot cheaper.

The main argument against this solution is that it doesn't work.
The task manager has to know about all of the tasks, so that it
can signal them in case of a shutdown request, then wait until
they've finished.

To expose something to a function is actually as hard as to expose it
to a class, isn't it? Ok, task manager. I do not still see why there
must be a class for it.

It needs to be a container, which can contain a variable number
of entries. Thus, a class. You can use std::vector<Thread*> if
you want, but in that case, you may still want to handle order
of initialization issues, in order to allow threads to be
started from the constructor of a static object. (Not usually
a good idea, but there are probably cases where it is
justified.)

There are so many possible ways how to handle and manage
concurrency. I think there are at least 4 major ways plus
endless subtypes. Some, (like OpenMP) make it business of
implementation others deal with threads very closely and
explicitly. I do not think that any of them does start threads
from constructors of static objects, but who knows.

Who said anything about starting a thread from the constructor
of a static object? You call a function (typically a member of
a Thread object) to start a thread. A thread itself has two
distinct software components, the code to be executed, and
a component which represents the thread itself: its
meta-information, like state. (Many early thread designs
confounded the two.) But if you want to shut down cleanly, you
need to be able to find all of those threads, in order to notify
them, and to wait for them to terminate. (Calling exit() in one
thread, while other threads are still running, will generally
result in undefined behavior.) So each thread must register
itself somewhere; that registry must be unique, and so
a singleton.

namespace taskmanager {
void shutDown(); // shortest thing to say.
// other operations with task manager.
}

And where do you put the data that taskmanager::shutDown needs
to manipulate? Data which is also accessed each time you start
a thread, etc.

Why not in some personal vector of taskmanager?

Because namespaces can't have "personal" members? You can't
protect access to a namespace. (I'd probably make TaskManager
a singleton class, with the actual implementation a forward
declared private member class, for a maximum of encapsulation.)

Everyone has to give their tasks to taskmanager without
knowing how many threads for the job there is. Is it some
other model of concurrency?

It depends on why you are using threads. In a server, for
example, there will be one or two threads per connection. In
a GUI, a thread may be explicitly created in order to handle
a specific request. About the only time you don't really know
how many threads you want to create is when you're using threads
for parallelization. (Of course, the thread you create when
a client logs onto your server may create other threads, to do
specific tasks for it.)

None of this is really relevant to what I was saying, however.
It doesn't matter who creates the threads, or how many they may
create. Every thread must register with some central, unique
thread manager, and provide some means for this thread manager
to shut it down. This is really private to the threading
subsystem (except that you'd generally provide an externally
accessible function to trigger the shutdown---and usually
functions to iterator through the threads, display there status,
etc., for debugging purposes).

Yes, some things may need specific thread. For example OpenGL
is AFAIK implemented so that you can draw to one display from
only one thread and no other thread. However presence of such
threads may be also consulted (with some special function) to
taskmanager.

That's a different issue. Some components may use threads, and
may require a specific registry of the threads they're using.
(In the case of a GUI, there will normally be one thread per
window, and all updates to the window occur in that thread. The
system will, however, provide some means of posting a request
to the window.)

It might be feels like C way of information hiding, but
nameless namespaces are exactly for that ... like i understand
the standard. Sure, it may be a class too.

Unnamed namespaces are only accessible from the translation unit
in which they occur. Something like a thread probably has
a couple of different globally accessible functions; if it's in
library code, each will be in a separate file.

Thread is not like object for me. It is more like running function
(activity). Threads usually communicate with messages or signals.
Basic functionality function of receiving messages and accepting jobs
to do and reporting back about problems may be same for all threads.
On that case also the place from where they get their tasks may be
centralized and be located at same place the function is implemented.

A thread has behavior (in addition to just executing its code)
and state. It's necessary to maintain the meta-information
concerning the threads state somewhere.

The tasks that thread does may have some other mechanics for
canceling them on half run, for example they may get some
signal interface to attach and observe. That really needs
whole separate newsgroup about how to implement concurrency in
C++; there probably are some in usenet.

If you need clean shutdown (not all applications do, and I've
written programs where the only way of stopping them was a "kill
-9"), then you need to be able to stop threads in
a deterministic amount of time. If every thread except the root
thread runs in a deterministic amount of time, this is trivial:
just block the creation of new threads, and wait. Otherwise,
there must be a means of signaling each thread (and unblocking
it), so that it can shutdown cleanly. And a means of signaling
each thread presupposes a means of finding each thread.

(Thinking about it, this seems to be a common theme of
singletons. The have to know about all of the instances of
something else: threads, temporary files, etc.)

Yes. For example factories. I prefer factory functions when
there are limits so creation (or destruction, or reusage) of
something should not be made in usual ways. How a factory
function achieves it is implementation detail. It may use real
singletons that are defined in (nameless namespace in)
translation unit that defines such factory function.

The classical singleton is nothing more than a factory function.
With the twist that it always returns the same instance, rather
than a new one each time.

I believe it is bad twist and does not fit too well into C++
where you may have free functions.

It's neither bad nor good. It becomes bad or good depending on
what you do with it.

You can often use free functions as well. But if you have state
(immutable or otherwise), it usually makes more sense to use
a class, in order to control access to the state.

Every time i see one it feels like
"std::sorter::instance().sort(from, to)".

I've never seen anything that silly (at least not in C++).
A "sorter" doesn't have any state that is maintained between
instances, so wouldn't be a singleton. For this reason,
std::sort is not a class, but a function. (But the standard
isn't always very clean about this distinction.
std::list<>::sort is a member function, because it needs access
to the internal details of list<> in order to have an efficient
implementation. A much better solution would be to have
a specialization of the std::sort free function, which is
a friend. But that would probably require some sort of partial
specialization of functions; I'm not sure if it could be done
with just overloading and SFINAE.)

When it is surely same object of whose member functions i call
then i prefer to not have that object on table at all.

Whether the object is visible to the user or not is really not
an important issue. I tend to use static member functions, so
the user doesn't have to consider the instance. But when
obtaining the instance is non trivial (e.g. it may require
a lock), that can have serious performance implications. There
is no universal solution.

With that object i may anyway do nothing but call its member
functions.

If obtaining the object is not free, then it may be an advantage
to obtain it once, rather than to obtain it each time you need
to access one of its functions (directly or indirectly).
I switched to a static interface (only static member functions,
so you never call instance()) in CommandLine, because its
functions are definitely called seldom enough, and because
getting the instance is very fast. In some earlier projects,
however, I've used a standard singleton, and avoided calling the
instance() function in tight loops for performance reasons.

--
James Kanze