Re: __gnu_cxx::hash_set efficient union

From:

mark.dufour@gmail.com

Newsgroups:

comp.lang.c++

Date:

Fri, 17 Aug 2007 16:15:59 -0000

Message-ID:

<1187367359.851894.220580@g4g2000hsf.googlegroups.com>

I'm sure the effect is negligible, but is there any reason
you're dynamically allocating the set here? I'd have though
set itself is a fairly small object so creating it on the
stack and returning that (and relying on RVO) might be better.

relying on optimizations is dangerous business :-) besides I'm not
much of a C++ expert, so copying things around all the time sounds a
bit scary..

Also, I have read that support for multithreading, injudiciously
applied by library providers, can have a ruinous effect upon
performance of standard containers. So it may be worth having a
look at the documentation (or on google) to see if there are
any such issues surrounding __gnu_cxx::hash_set.

I cannot find any useful information.

I did write another test, without using a GC and default hash
function, and I cannot get it to run much faster than this (still
several times slower than Python):

#include <ext/hash_set>

int main() {
    typedef __gnu_cxx::hash_set<int> HS;
    HS *ha = new HS();
    HS *hb = new HS();
    HS *hc;

    HS::iterator it1, it2;

    for(int i = 0; i<10000; i++)
        ha->insert(rand() % 100000);
    for(int i = 0; i<10000; i++)
        hb->insert(rand() % 100000);

    for(int i = 0; i<1000; i++) {
        hc = new HS(*ha);
        hc->insert(hb->begin(), hb->end());

        delete hc;
    }

}

does anyone have an idea how to speed this up..?

this is the python equivalent:

import random

seta = set([random.randint(0,100000) for n in xrange(10000)])
setb = set([random.randint(0,100000) for n in xrange(10000)])

for i in xrange(1000):
    seta | setb

thanks!
mark dufour.