Re: [Algorithm] Sum of Primes < 1000000
Lew wrote:
Then you didn't read it and understand it. Please try again, or give
up, but there's really no need to attack.
I saw no attack in Simon's post. Did I miss something?
Yes; the fact that he was accusing me of being bad at math.
Simon raised some interesting mathematically-motivated points.
Well, to start with, he basically accused Wikipedia of having the
integral of ln(x) wrong.
He also correctly stated that an average of a set exceeds 1/4 the size
of the set, but thereby was trying to imply that the optimization sucked
because something was therefore O(N) in the total number range instead
of O(log N). The set was of primes, and the size of that is O(log N), so
1/4 of that is O(log N). He actually ends up proving the weak result
that the size of the average prime less than N is *at least* O(log N)
(not ruling out O(N) or whatever); I had gotten the stronger result that
the size of the average prime *was* only O(log N).
The reasoning in a nutshell:
Algorithm to build sieve up to N as I stated it has three parts. Loop
until sqrt(N) (O(sqrt(N)). For each p reached that's not gotten marked
composite earlier, loop from p^2 until N step p. The number of these
loops is O(P), P the number of primes up to sqrt(N). The length of a
given one is O(N/p). Turns out P is O(log(sqrt(N)) = O(1/2 log(N)) =
O(log(N)), so this loop is O(log(N)*X) where the average length of a
loop is O(X). What is the average length of a loop?
It's O(N/p) of course, for the average choice of p. Primes are
logarithmically distributed (P is O(log(N)) above) so the average p is
going to be small fairly independently of the range. It's hard to be
sure it's O(1) though. The average is the sum divided by the total
number. The sum of the primes, the way they are logarithmically
distributed, could be approximated by the integral of the logarithm. (It
might actually be better now I think of it to integrate N/p -- N*(1/2 +
1/3 + 1/5...) looks like N times an exponential decay function, whose
integral is itself and which is clearly O(1))
I concluded that the average length of the loop is bounded by O(N), and
the whole algorithm by O(N log N). If it's actually N log log N or can
be made such, so much the better.