Re: SuperKISS for 32- and 64-bit RNGs in both C and Fortran.
On Fri, 27 Nov 2009 13:52:21 -0800 (PST), geo <gmarsaglia@gmail.com>
wrote:
On Nov 3 I posted
RNGs: A Super KISS sci.math, comp.lang.c, sci.crypt
a KISS (Keep-It-Simple-Stupid) RNG combining,
by addition mod 2^32, three simple RNGs:
CMWC(Complementary-Multiply-With-Carry)+CNG(Congruential)
+XS(Xorshift)
with resulting period greater than 10^402575.
The extreme period comes from finding a prime p=a*b^r+1 for
which the order of b has magnitude near p-1, then use
the CMWC method, the mathematics of which I outline here:
Let Z be the set of all "vectors" of the form
[x_1,...,x_r;c] with 0<=x_i<b and 0<=c<a.
Then Z has ab^r elements, and if the function f() on Z is
f([x_1,x2,...,x_r;c])=
[x_2,...,x_r,b-1-(t mod b);trunc((t/b)], t=a*x_1+c
then f() has an inverse on Z: for each z in Z there is exactly
one w in Z for which f(w)=z:
f^{-1}([x_1,x2,...,x_r;c])=
[trunc((v/a),x_1,...,x_{r-1}; v mod a], v=cb+(b-1)-x_r
Thus a directed graph based on z->f(z) will consist only
of disjoint loops of size s=order(b,p) and there will be
L=ab^r/s such loops.
A random uniform choice of z from Z is equally likely to
fall in any one of the L loops, and then each "vector" in
the sequence obtained by iterating f:
f(z), f(f(z)), f(f(f(z))),...
will have a uniform distribution over that loop, and the sequence
determined by taking the r'th element from each "vector",
(the output of the CMWC RNG) will be periodic with period
the order of b for the prime p=ab^r+1, and that sequence
will produce, in reverse order, the base-b expansion of a
rational k/p for some k determined by choice of the seed z.
For that Nov 3 post, I had found that the order of b=2^32
for the prime p=7010176*b^41790+1 is 54767*2^1337279, about
10^402566, thus providing an easily-implemented
KISS=CMWC+CNG+XS RNG with immense period and
excellent performance on tests of randomness.
That easy implementation required carrying out the essential
parts of the CMWC function f(): form t=7010176*x+c in 64 bits,
extract the top 32 bits for the new c, the bottom 32 for
the new x---easy to do in C, not easy in Fortran.
And if we want 64-bit random numbers, with B=2^64, our prime
becomes 7010176*B^20985+1, for which the period of B is
54767*2^1337278, still immense, but in C, with 64-bit x,c
there seems no easy way to form t=7010176*x+c in 128 bits,
then extract the top and bottom 64 bit halves.
So base b=2^32 works for C but not Fortran,
and base B=2^64 works for neither C nor Fortran.
I offer here is a prime that provides CMWC RNGs for both
32- and 64-bits, and for both C and Fortran, and with
equally massive periods, again greater than 2^(1.3million):
p=640*b^41265+1 = 2748779069440*B^20632+1 = 5*2^1320487+1.
That prime came from the many who have dedicated their
efforts and computer time to prime searches. After some
three weeks of dedicated computer time using pfgw with
scrypt, I found the orders of b and B:
5*2^1320481 for b=2^32, 5*2^1320480 for B=2^64.
It is the choice of the "a" that makes it feasible to get
the top and bottom valves of t=a*x+c, yet stay within the
integer sizes the C or Fortran compilers are set for.
In the above prime: a=640=2^9+2^7 for b=2^32 and
a=2748779069440=2^41+2^39 for B=2^64.
Thus, for example with b=2^32 and using only 32-bit C code,
with a supposed 128-bit t=(2^9+2^7)*x+c, the top and bottom
32-bits of t may be obtained by setting, say,
h=(c&1); z=(x<<9)>>1 + (x<<7)>>1 + c>>1;
then the top half of that t would be
c=(x>>23)+(x>>25)+(z>>31);
and the bottom half, before being complemented, would be
x=(z<<1)+h;
When B=2^64 we need only change to
h=(c&1); z=(x<<41)>>1 + (x<<39)>>1 + c>>1;
c=(x>>23)+(x>>25)+(z>>63);
These C operations all have Fortran equivalents, and will
produce the required bit patterns, whether integers are
considered signed or unsigned. (In C, one must make sure
that the >> operation performs a logical right shift,
perhaps best done via "unsigned" declarations.)
The CMWC z "vector" elements [x_1,x_2,...,x_r] are kept in
an array, Q[] in C, Q() in Fortran, with a separate current
"carry". This is all spelled out in the following examples:
code for 32- and 64-bit SuperKiss RNGs for C and Fortran.
Note that in these sample listings, the Q array is seeded
by CNG+XS, based on the seed values specified in the
initial declarations. For many simulation studies, the
73 bits needed to seed the initial xcng, xs and carry<a
for the 32-bit version, or 169 bits needed for the 64-bit
version, may be adequate.
But more demanding applications may require a significant
portion of the >1.3 million seed bits that Q requires.
See text and comments from the Nov 3 posting.
I am indebted to an anonymous mecej4 for providing the basic
form and KIND declarations of the Fortran versions.
Please let me and other readers know if the results are not
as specified when run with your compilers, or if you can
provide equivalent versions in other programming languages.
George Marsaglia
--------------------------------------------------------
Here is SUPRKISS64.c, the immense-period 64-bit RNG. I
invite you to cut, paste, compile and run to see if you
get the result I do. It should take around 20 seconds.
--------------------------------------------------------
/* SUPRKISS64.c, period 5*2^1320480*(2^64-1) */
#include <stdio.h>
static unsigned long long Q[20632],carry=36243678541LL,
xcng=12367890123456LL,xs=521288629546311LL,indx 632;
#define CNG ( xcng=6906969069LL*xcng+123 )
#define XS ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
#define SUPR ( indx<20632 ? Q[indx++] : refill() )
#define KISS SUPR+CNG+XS
unsigned long long refill( )
{int i; unsigned long long z,h;
for(i=0;i<20632;i++){ h=(carry&1);
z=((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
carry=(Q[i]>>23)+(Q[i]>>25)+(z>>63);
Q[i]=~((z<<1)+h); }
indx=1; return (Q[0]);
}
int main()
{int i; unsigned long long x;
for(i=0;i<20632;i++) Q[i]=CNG+XS;
for(i=0;i<1000000000;i++) x=KISS;
printf("Does x=4013566000157423768\n x=%LLd.\n",x);
}
---------------------------------------------------------
Here is SUPRKISS32.c, the immense-period 32-bit RNG. I
invite you to cut, paste, compile and run to see if you
get the result I do. It should take around 10 seconds.
---------------------------------------------------------
/*suprkiss64.c
b=2^64; x[n]=(b-1)-[(2^41+2^39)*x[n-20632]+carry mod b]
period 5*2^1320480>10^397505
This version of SUPRKISS doesn't use t=a*x+c in 128 bits,
but uses only 64-bit stuff, takes 20 nanos versus 7.5 for
the 32-bit unsigned long long t=a*x+c version.
*/
/* SUPRKISS64.c, period 5*2^1320480*(2^64-1) */
#include <stdio.h>
static unsigned long long Q[20632],carry=36243678541LL,
xcng=12367890123456LL,xs=521288629546311LL,indx 632;
#define CNG ( xcng=6906969069LL*xcng+123 )
#define XS ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
#define SUPR ( indx<20632 ? Q[indx++] : refill() )
#define KISS SUPR+CNG+XS
unsigned long long refill( )
{int i; unsigned long long z,h;
for(i=0;i<20632;i++){ h=(carry&1);
z=((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
carry=(Q[i]>>23)+(Q[i]>>25)+(z>>63);
Q[i]=~((z<<1)+h); }
indx=1; return (Q[0]);
}
int main()
{int i; unsigned long long x;
for(i=0;i<20632;i++) Q[i]=CNG+XS;
for(i=0;i<1000000000;i++) x=KISS;
printf("Does x=4013566000157423768\n x=%LLd.\n",x);
}
-----------------------------------------------------------
And here are equivalent Fortran versions, which, absent
C's inline features, seem to need ~10% more run time.
[Fortran versions removed, along with comp.lang.fortran
and comp.lang.c]
---------------------------------------------------------------
Two simple Java versions. Neither extends System.Random, though it
would not be too difficult to do so if required.
rossum
/*** 32 bit SuperKiss32 ***/
/**
* <p>An implementation of George Marsaglia's long period SuperKISS32
* PRNG. The period of this PRNG is 5*2^1320481*(2^32-1)</p>
*
* <p>This version is written for clarity rather than speed.</p>
*
* @author rossum
* @version 1.0; November 2009
*/
public class SuperKiss32 {
final int Q_SIZE = 41265;
int mQ[] = new int[Q_SIZE];
private int mCarry = 362;
private int mXcng = 1236789;
private int mXs = 521288629;
private int mIndex = Q_SIZE + 1;
public SuperKiss32() {
// Fill mQ[] with Congruential + XorShift
for (int i = 0; i < Q_SIZE; ++i) {
congruential();
xorShift();
mQ[i] = (mXcng + mXs);
}
}
public int nextInt() {
return stepGenerator();
}
private int refillQ() {
int z, h;
for (int i = 0; i < Q_SIZE; ++i) {
h = mCarry & 0x01;
z = ((mQ[i] << 9) >>> 1) +
((mQ[i] << 7) >>> 1) +
(mCarry >>> 1);
mCarry = (mQ[i] >>> 23) + (mQ[i] >>> 25) + (z >>> 31);
mQ[i] = ~((z << 1) + h);
}
mIndex = 1;
return mQ[0];
}
private void congruential() {
mXcng *= 69069;
mXcng += 123;
}
private void xorShift() {
mXs ^= (mXs << 13);
mXs ^= (mXs >>> 17);
mXs ^= (mXs << 5);
}
private int supr() {
return (mIndex < Q_SIZE) ? mQ[mIndex++] : refillQ();
}
private int stepGenerator() {
congruential();
xorShift();
return (supr() + mXcng + mXs);
}
public static boolean testKiss32(boolean verbose) {
int x = 0;
final int expected = 1809478889;
boolean retVal;
SuperKiss32 sk32 = new SuperKiss32();
if (verbose) {
System.out.println("Testing SuperKiss32...");
System.out.println();
}
int reps = 1000000000;
for (int i = 0; i < reps; ++i) {
x = sk32.stepGenerator();
}
if (x == expected) {
if (verbose) {
System.out.println(
"One billion uses of SuperKiss32 OK.");
}
retVal = true;
} else {
if (verbose) {
System.out.println("SuperKiss32 failed: x == " +
x + ", not " + expected);
}
retVal = false;
}
return retVal;
}
} // end class SuperKiss32
/**************************/
/*** 64 bit SuperKiss64 ***/
/**
* <p>An implementation of George Marsaglia's long period SuperKISS64
* PRNG. The period of this PRNG is 5*2^1320480*(2^64-1)</p>
*
* <p>This version is written for clarity rather than speed.</p>
*
* @author rossum
* @version 1.0; November 2009
*/
public class SuperKiss64 {
final int Q_SIZE = 20632;
long mQ[] = new long[Q_SIZE];
private long mCarry = 36243678541L;
private long mXcng = 12367890123456L;
private long mXs = 521288629546311L;
private int mIndex = Q_SIZE + 1;
public SuperKiss64() {
// Fill mQ[] with Congruential + XorShift
for (int i = 0; i < Q_SIZE; ++i) {
congruential();
xorShift();
mQ[i] = (mXcng + mXs);
}
}
public long nextLong() {
return stepGenerator();
}
private long refillQ() {
long z, h;
for (int i = 0; i < Q_SIZE; ++i) {
h = mCarry & 0x01L;
z = ((mQ[i] << 41) >>> 1) +
((mQ[i] << 39) >>> 1) +
(mCarry >>> 1);
mCarry = (mQ[i] >>> 23) + (mQ[i] >>> 25) + (z >>> 63);
mQ[i] = ~((z << 1) + h);
}
mIndex = 1;
return mQ[0];
}
private void congruential() {
mXcng *= 6906969069L;
mXcng += 123L;
}
private void xorShift() {
mXs ^= (mXs << 13);
mXs ^= (mXs >>> 17);
mXs ^= (mXs << 43);
}
private long supr() {
return (mIndex < Q_SIZE) ? mQ[mIndex++] : refillQ();
}
private long stepGenerator() {
congruential();
xorShift();
return (supr() + mXcng + mXs);
}
public static boolean testKiss64(boolean verbose) {
long x = 0;
final long expected = 4013566000157423768L;
boolean retVal;
SuperKiss64 sk64 = new SuperKiss64();
if (verbose) {
System.out.println("Testing SuperKiss64...");
System.out.println();
}
int reps = 1000000000;
for (int i = 0; i < reps; ++i) {
x = sk64.stepGenerator();
}
if (x == expected) {
if (verbose) {
System.out.println(
"One billion uses of SuperKiss64 OK.");
}
retVal = true;
} else {
if (verbose) {
System.out.println("SuperKiss64 failed: x == " +
x + ", not " + expected);
}
retVal = false;
}
return retVal;
}
} // end class SuperKiss64
/**************************/