Re: How to make this program more efficient?
On Sep 17, 3:13 pm, Jerry Coffin <jcof...@taeus.com> wrote:
In article <a9c3da76-0b29-46c0-b2ef-510d42322355
@c65g2000hsa.googlegroups.com>, james.ka...@gmail.com says...
[ ... ]
So it's one bit per thread, rather than a single bit. So
for n threads, you need n bits, rather than just lg(n).
There's something I'm not understanding.
Yes -- my original article was rather misleading, but this
does not optimize space. Rather, it optimizes speed at the
expense of a little extra space. That is, you avoid all your
reader threads contending for a single resource (the reference
count). Instead, each has its own "reference count" and only
has to contend with the writer thread for its use. From a
viewpoint of space utilization this typically won't be any
different from having a full-blown reference count for each
thread -- i.e. you want to store the bits unpacked so each
word can be accessed independently of the others.
You mean you have to store the bits unpacked, if it is to
work:-). When you use bit fields (or do the masking by hand),
the compiler generates a read/modify/write cycle, with no
intervening locks or synchronization. (On some machines, e.g.
early Dec Alphas, even if you use a complete byte per bit, the
hardware would transform this into a read/modify/write, with no
intervening locks or synchronization. From what I understand,
the upcoming thread handling in the standard will not allow
I'm still not too sure how your scheme works. In order to know
whether to free or not, you have to access the "bits" written by
the other threads, which means some sort of synchronization is
And of course, this could increase memory use enormously, if,
say, you had a couple of thousand threads. (One of the servers
I currently work on regularly runs with several hundred threads,
and the number is growing.)
With a reference count, the a single use of the map (or
whatever resource you're sharing) involves the following
1) obtain mutex on reference count.
2) read, modify and write the reference count.
3) release the mutex on the reference count.
4) Use the shared resource.
5) obtain mutex on reference count.
6) read, modify and write the reference count.
7) release the mutex on the reference count.
With a mutex for each thread the cycle is:
1) set the mutex
2) Use the shared resource
3) clear the mutex
The number of steps can be somewhat misleading though: steps 2
and 6 of the original set each involve an RMW cycle by
themselves, which is completely absent in the second set. In
addition, obtaining the mutex (steps 1 and 5) takes longer as
the number of threads grows, another feature that's absent
from the second set.
Most modern machines have the necessary instructions to support
a non-locking atomic increment or decrement. When the reference
count schema is used, this can be used instead of the mutexes;
with reference counting and swapping pointers, I'm pretty sure
that I could get the whole thing to work on a modern (v9 or
later) Sparc without a single mutex. It would take a little bit
of assembler, however. Even with relaxed memory order.
James Kanze (GABI Software) email:firstname.lastname@example.org
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34