Hacker News new | past | comments | ask | show | jobs | submit login
Sieve of Eratosthenes in Python (2015) (solipsys.co.uk)
106 points by ColinWright on Oct 5, 2017 | hide | past | favorite | 49 comments



For anyone interested in implementations of the Sieve of Erastothenes, here is a very interesting paper about functional ones, which are harder than they appear at first: https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf


I implemented that in chez scheme recently, and it is pretty darn fast.

https://bitbucket.org/bjoli/primes/src (the file lazy-prime-sieve.scm)

it uses Nietzsche which can be found in my repos. On my three year old computer, it computes the first ten million primes in about 40 seconds (up to, and including 179424673). That is only about 4 times slower than the optimized non-lazy naive sieve.


That's pretty cool. Lest we forget, "computers are fast."

I imagine computing the first ten million primes once upon a time required a super computer and hundreds of hours.


Years don't tell much, what kind of CPU is it ? Still very nice to see.


It is an Ivy Bridge or Haswell mobile i7. It was pretty high en for a laptop in late 2013/early 2014.


The sieve of Eratosthenes can be adapted to build tables of other number properties: smallest prime factor, totient function, number of unique prime factors (omega), product of unique prime factors (radical).

Contains code in Python and other languages: https://www.nayuki.io/page/the-versatile-sieve-of-eratosthen...


I first learned of this sieve while reviewing the code for the Primecoin proof-of-work. Primecoin is a cryptocurrency that tries to have its PoW provide greater utility than hashing, but its PoW otherwise is similar to hashing. The task is to produce Cunningham chains [1] of prime numbers. The reference implementation of Primecoin computes primes using the Sieve of Eratosthenes (w/arbitrary precision integer math).

[1] https://en.wikipedia.org/wiki/Cunningham_chain


What is the use of producing these primes?


Here is very concise implementation of sieve I use in algorithm contests:

  int n = 100;
  vector<int> p(n);

  void sieve()
  {
      for(int i = 2; i < n; i++)
          for(int j = i; j < n; j += i) p[j]++;
  }

  int is_prime(int k)
  {
      return p[k] == 1;
  }


There's a small optimization on the original article which I think your implementation doesn't do: "At each stage we know that every composite number up to p^2 has definitely been marked as such"..

Assuming your implementation is working fine (didn't test it), I think the change would be j = i * i on the inline for:

  void sieve()
  {
      for(int i = 2; i < n; i++)
          for (int j = i*i; j < n; j += i) p[j]++;
  }
It's tricky to explain, but take for instance the numbers from 1 to 20. When i=2, it will clear [4, 6, 8, 10, 12, 14, 16, 18, 20] as non-primes. So when i=3, you can skip 6 (which was cleared by i=2) and start directly at 9 (i * i = 3 * 3) to clear [9, 12, 15, 18]. Also, when i=4, you can skip the 8 and 12 (which were cleared by i=2 and i=3) and start on 16 to clear [16, 20]. Finally, when i=5, you can start directly at 25 (out of bounds in this case) because 10, 15 and 20 were already cleared by the previous iterations..

Edit: But as other posters are saying below, the original sieves algorithm shouldn't even have to test i=4 because the numbers it clears (naivelly [4,8,12,16,20]) have already been cleared by the previous iterations of the algorithm.


> It's tricky to explain

This reasoning is fairly simple:

Suppose we have a composite number K that is less than P^2.

Then K = A * B, for some A and B not equal to 1, (which is the definition of composite). Let us say that A is less than or equal to B. (If this is not the case, switch the labels.)

Then A must be less than P:

If A is greater than P, K = A * B > A * P > P^2. But we chose K such that K < P^2.

If A is less than P, then K will have been marked during the cycle of Q, a prime divisor of A, which will occur before P. Therefore, we do not need to examine any numbers below P^2 -- they are either prime or have been marked.


Maybe stop the outer loop as soon as i * i > n then as well.

Also, where's the check whether `i` is actually prime before you start crossing off multiples? You don't need to cross off multiples of composite numbers.


> Also, where's the check whether `i` is actually prime before you start crossing of multiples? You don't need to cross off multiples of composite numbers.

You are absolutely right as the original method seems to specify that. I've edited my answer to show it.

So the function would be something like (untested):

  void sieve()
  {
      for (int i = 2; i*i < n; i++)
          if (p[i] == 1)
              for (int j = i*i; j < n; j += i) p[j]++;
  }
There's the problem, as other posters are saying, that if p[j] == 1 is because it was cleared as non-prime, but I've just tested p[i] == 1 for primality for coherence with the original author.


You can avoid the more expensive squaring in favour of addition if you keep a couple of extra counters.


My Project Euler prime sieving / counting code was

  vector<uint8_t> is_prime(n, true);

  // Sieve
  for (uint64_t k = 3; k * k < n; k += 2)
    if (is_prime[k])
      for (uint64_t multiple = k * k; multiple < n; multiple += 2 * k)
        is_prime[multiple] = false;
It only loops over odd numbers; it only crosses off odd multiples of prime numbers. Note the stop condition of the outer loop and starting point of the inner loop.

Note that this code is memory-bound. You can get nearly 10x speed up if you implement a segmented sieve (sieving [2,n] in intervals, keeping memory costs more or less fixed)


Here's my Python version. Same thing, using slice-skip addressing to replace the inner loop.

    n=1000000
    isPrime = [True] * n
    for i in xrange(2,int(n**0.5)+1):
      if isPrime[i]: isPrime[i*i::i] = [False] * ((n-i*i-1)/(i)+1)


Slightly modified version, which uses some of the bounds adjustments suggested by other comments:

    int n = 100
    vector<int> p(n);

    void sieve()
    {
        for(int i = 2; i * i < n; i++) {
            if (p[i] == 0) { // Could probably just !p[i]
                for(int j = i * i; j < n; j += i) {
                    p[j]++;
                }
            }
        }
    }

    bool is_prime(int k)
    {
        return p[k] == 0;
    }


Whoa! Code golf. Here is the most compact I could make it in racket in about 3 minutes:

    (define (erathostenes n)
      (let ([vec (make-vector n #t)])
        (for/list ([i (in-range i n)] #:when (vector-ref vec i))
          (for ([j (in-range (* i i) n i)])
            (vector-set! vec j #f))
          i)))
It could probably be squeezed into a oneliner in python :)


If the goal is conciseness, let's use K. Start with a range:

      1+!20
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prepend runs of 0 of each length with 1:

      (1,&:)'1+!20
    (1 0
     1 0 0
     1 0 0 0
     1 0 0 0 0
     1 0 0 0 0 0
     1 0 0 0 0 0 0
     1 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
Cycle each pattern to form a matrix:

      (20#1,&:)'1+!20
    (1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
     1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0
     1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
     1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
     1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0
     1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
     1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
     1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0
     1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
Sum the rows of the matrix:

      +/(20#1,&:)'1+!20
    20 0 1 1 2 1 3 1 3 2 3 1 5 1 3 3 4 1 5 1
Find indices with exactly one factor:

      &1=+/(20#1,&:)'1+!20
    2 3 5 7 11 13 17 19
Generalize to any upper limit x:

      {&1=+/(x#1,&:)'1+!x}100
    2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
As an afterthought, I realized it's easier and equivalent to work from a 0-based range:

      {&2=+/(x#1,&:)'!x}100
    2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
I couldn't precisely remember this function, but I'd seen the approach before and it was easy to build up piece by piece in the REPL.


That sounds like a challenge (but I've goofed off work enough for one day so they'll have to be mediocre solutions):

  set(range(2, MAX_NUM)) - {j for i in range(2, MAX_NUM) for j in range(2*i, MAX_NUM, i)}
Or to put it in the same form as the original:

  [False, False] + [k not in {j for i in range(2, MAX_NUM) for j in range(2*i, MAX_NUM, i)} for k in range(2, MAX_NUM)]


Have some Erlang:

    sieve(N) -> sieve(lists:seq(2, N), []).
    sieve([], L) -> lists:reverse(L); 
    sieve([Prime|T], L) -> sieve([X || X <- T, X rem Prime /= 0], [Prime|L]).


Depending on how you plan to use it, a bitset may be more efficient (at least memory wise). E.g. https://gist.github.com/cslarsen/1590498


I... don't think that works. Doesn't that just populate every p[j] with j+1?


It fills the array with the number of factors each number has, by taking each number less than n, and for each integer multiple of it less than n incrementing that cell.


No, it replaces p[j] with p[j]+1. This way, only those that have 1 are prime. They aren't divisible, except by a single number, since the count is 1.


He's adding i, not 1, to j on every inner loop. Basically he's incrementing any array element that is a multiple of 2, then of 3, etc.


Then 4, then 5, then 6 ... in other words he's not doing the test that something is prime, and crossing out the multiples of composites as well.

There are other problems as well, he adds one each time to p[j], but tests to see if it's ==1, so that won't work.

So while the output might be right (but it isn't), it's certainly not the actual Sieve of Eratosthenes for any one of several reasons.


> in other words he's not doing the test that something is prime, and crossing out the multiples of composites as well.

This doesn't change the outcome, since the multiples of composites are also multiples of primes and never primes.

> There are other problems as well, he adds one each time to p[j], but tests to see if it's ==1, so that won't work.

You're wrong here -- testing equality to 1 for the number of times an entry in the array was hit is exactly testing of primality post-sieve. If it was only touched once, then it has nothing that divides it (since the only cycle it was incremented on is the one starting there).

> it's certainly not the actual Sieve of Eratosthenes for any one of several reasons

If you initialized the array entries to 1 and started indexing your inner loop at j= i * i, it's exactly the sieve (modulo multiple strikeouts of removed numbers).


>> ... he's not doing the test that something is prime, and crossing out the multiples of composites as well.

> This doesn't change the outcome, since the multiples of composites are also multiples of primes and never primes.

It doesn't change the outcome (I said that somewhere else) but it changes the algorithm. The algorithm now produces primes, yes, but it's not the Sieve of Eratosthenes.

>> ... he adds one each time to p[j], but tests to see if it's ==1, so that won't work.

> You're wrong here -- testing equality to 1 for the number of times an entry in the array was hit is exactly testing of primality post-sieve. If it was only touched once, then it has nothing that divides it (since the only cycle it was incremented on is the one starting there).

You're right, so this is definitely a different algorithm. In this algorithm what he does is:

  Initialise a vector V to 0
  For every number n>=2:
    For every multiple m>=n of n:
      Add one to V[m]
The primes are those with V[n]==1. Yes, that works, I was mistaken because I was expecting the SoE. This is not that.

>> it's certainly not the actual Sieve of Eratosthenes for any one of several reasons

> If you initialized the array entries to 1 and started indexing your inner loop at j= i \* i, it's exactly the sieve (modulo multiple strikeouts of removed numbers).

... and conditioned on striking out multiples of primes you've already identified.

There are two key components to the Sieve of Eratosthenes. One is that you walk down the vector instead of doing divisions, the other is that you identify the primes so far and only strike out those multiples.

In particular, this code must have the outer loop run all the way to n whereas the SoE only runs to sqrt(n). That's a key part of being the Sieve.

As observed by stabbles[0], the time complexity of the code presented here is worse than the time complexity of the Sieve of Eratosthenes, therefore it's a different algorithm.

[0] https://news.ycombinator.com/item?id=15409394


Your review of the code was incorrect. The code tests primality with the ==1 check that you say is wrong.

Why do you say that the output is wrong?


I was wrong when I claimed the output would be wrong - see my comments here[0]. The output is correct, but the code is not the Sieve of Eratosthenes. I was misled by the expectation that it would be.

[0] https://news.ycombinator.com/item?id=15410300


Good point. The loop's upper-bound check differs.

Thanks!


This has _not_ the same time complexity as the sieve of Erasthones.



Back about 1964, when I was learning to program (IBM 7090 in assembly language), I did it at assembly time in the surprisingly powerful IBM 7090 macro assembly language. Probably somewhat slower.


Do you still have a reference for that?


How long does that take to compile?


Haven't timed it, but it is fast.

You can see how quick the assembly is generated on godbolt [1]

And that verifies it is fully compile-time indeed :).

[1] https://godbolt.org/g/VHvxhi

edit: it is about 1 second up to n = 262142, after that the compiler delegates the computation to runtime.


Re: the 'more Dynamic"' version, have you seen the amazing postponed sieve generator?

This is the 'really Dynamic' version, it recursively treats all previous primes as a special case.

https://stackoverflow.com/questions/2211990/how-to-implement...


This is a well written post. It's not new stuff, but the combination of the sieve, along with the python code makes it pretty accessible.

The site itself is actually really interesting, I never knew about Russian Peasant Multiplication before [0].

EDIT: Fixed typo in link, thanks!

[0] http://www.solipsys.co.uk/new/RussianPeasantMultiplication.h...


JFYI, I find it simpler to use this method, (same method of halving and doubling) used often for Roman numerals multiplication, but in the version that uses two colums only, crossing out the lines not needed (more similar to the "sieve" approach BTW):

http://rbutterworth.nfshost.com/Tables/romanmult

and making a final sum (as opposed to partial sums as in the given "Russian Peasant" reference).


Thanks for fixing the link - I was going to remove this comment for the sake of being tidy (sys admin thinking) but it appears I can't delete it, only edit it.

So I've done that.

Cheers!


( It's okay not to tidy up all the history.

Have a good day everyone!


Reminds me of some code I wrote for a codeeval challenge for returning the sum of all primes up to n inclusive. IE: sum_primes(2) = 2, sum_primes(3) = 2+3 = 5 = sum_primes(4)...

It's odd that this version doesn't use a set(), but dictionary/list.

Anyway, I'll have to refactoring my code and measure against op - although I only used the python as a reference to sanity-check my naive c++ code:

https://github.com/e12e/codeeval/blob/master/4-sumprime/prim...

(I believe, like most(?) codeeval challenges, this is from project Euler)

[ed: should probably add that this isn't a sieve]


I usually use this sieve any time I need to work with small prime numbers, but I dug deeper into probabilistic primality tests recently. If you try to run a sieve on large numbers you quickly realize that it is not feasible. Probabilistic primality tests like the Miller-Rabin are very fast and incredibly accurate, yet they are nearly as simple as this sieve. I didn't fully trust RSA until I learned enough about Miller-Rabin to convince myself the prime numbers it uses are safe.


If you want to do your own visualizations using PyAlgoViz, check out http://chrislaffra.blogspot.nl/2016/12/generating-prime-numb...


very slightly relevant:

    Upon a stack of bits, about so tall,
    just think, O traveller, what we could do
    if every other bit was set to false
    beginning with (but not including) two?
    
    Now take two lowly numbers, A and B
    that equal three and two. What would happen
    if they, by twos and ones respectively,
    were incremented in a nested fashion,
    
    And if we falsified, at every turn,
    the value offset by A groups of Bs?
    Then to the aether let those bits return,
    to fly back home to Eratosthenes!
from www.ineptech.com (the "Langston" example)


The Eratosthenes sieve is pretty basic. I would expect any programmer to be able to do it on the whiteboard. Now the quadratic sieve, that is a challenge. The GNFS even more so.


These sieves are indeed much more complex, but are not really related to the sieve of Erathostenes except in name. Not clear why you bring them up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: