Re: Randomly selecting an element from an STL collection
On Jun 28, 9:23 pm, Generic Usenet Account <use...@sta.samsung.com>
wrote:
On Jun 27, 5:55 am, James Kanze <james.ka...@gmail.com> wrote:
I wonder if it's really necessary to bother. He's using rand()
to choose an element, and RAND_MAX can be (and far too often is)
as little as 32767. So if he has more elements that that, his
code isn't going to work. Also, he used the expression
rand() % distance to choose the element. This introduces a bias
toward the smaller elements; the closer distance is to RAND_MAX,
the larger the biais. Unless the number of elements is actually
very small, the biais will be noticeable. So we can conclude
that the actual number of elements will probably not be much
more than 10 or 20. And that a simple int will work just fine.
I would really appreciate if you
could tell me how I can avoid the "bias towards smaller elements". I
have always used the "modulo division by random number" trick to
restrict my pseudo-random numbers to a certain range.
That's usually sufficient for fairly small values. The bias
towards the smaller elements problems is easily understood,
however, if you consider a perfect random generator with a very
small RAND_MAX. Suppose RAND_MAX is 9, i.e. your generator
generates all of the numbers from 0 to 9, with equal
probability. Now consider what happens if you do a modulo 6.
What are the results for each number generated:
0 % 6 = 0
1 % 6 = 1
2 % 6 = 2
3 % 6 = 3
4 % 6 = 4
5 % 6 = 5
6 % 6 = 0
7 % 6 = 1
8 % 6 = 2
9 % 6 = 3
It's obvious that the values 0-3 each appear 20% of the time,
whereas 4 and 5 only appear 10%.
The problem doesn't occur if RAND_MAX + 1 is a multiple of your
value, so the usual solution is just to throw out any values
above the last multiple. For a random value in the range
0...N-1, I use something like:
unsigned const limit = RAND_MAX+1 - (RAND_MAX+1) % N ;
unsigned result = next() ;
while ( result >= limit ) {
result = next() ;
}
return result % N ;
I am keen to
avoid this mistake at some future date. A sample code snippet would
be great. Although I do not expect anywhere close to 32767 entries in
my collection, it is certainly going to be much more than 10 or 20.
Be careful with rand(). In the past, at least, a lot of
implementations have been very bad... some systematically
alternate even and odd, for example. For anything more than
just playing around, get a good generator with known behavior.
(Boost has quite a few.)
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34