One of the earlier exercises in computational methods is the prime number sieve. The straightforward yet non-intuitive nature of prime numbers makes it a mathematical treat. At first, the concept is simple:

$$ \mbox{for} \quad p,q \in \mathbb N \\ \mbox{s.t.} \quad p \geq 2 \quad and \quad q \neq p,1 \\ p \quad is\quad prime \iff p/q \notin \mathbb N $$

This may not be the most graceful notation, but I will lay it out more clearly. Given \(p\) and \(q\) in \(\mathbb N\) where \(\mathbb N\) is the set of 'natural numbers \(\mathbb N = \{0,1,2,3,...\}\). Such that \(p\) is greater than or equal to 2, and \(q\) is not \(p\) or \(1\). \(p\) is prime if and only if \(p\) divided by \(q\) is not a natural number. The only divisors of a prime number are \(p\) and \(1\).

While simple to parse through the first few primes: \(2, 3, 5, 7, 11, 13, 17, 19, ...\) the inability to quickly determine if a larger number, say \(n > 10^{3}\), is prime has troubled the mathematical world for some time now. I have a personal frustration where a four year old can tell you if \(10,001\) is an even number or odd number, but would be unable to even approach the question if it is prime or not. Now there is a theorem that helps us to some degree, intuitively named the prime number theorem (PNT).

$$ \lim_{n\rightarrow \infty} \frac{\pi(n)}{\frac{n}{\ln{n}}} = 1 $$

Where \(\pi(n)\) is the number of primes \(p\) such that \(2\leq p \leq n\). This can be used to assist in testing if a number is prime, but out of scope for the content of this article.

Regardless of the tricks that have been developed, I still won't be able to help a four year old solve if \(10,001\) is prime or not. That is what motivates me to create these simple computer programs. First you can solve answer the question, but equally investigating on how to improve the efficiency of a program also helps refine your understanding of the problems and ultimately create better code/math as a result.

Let's write a prime number sieve.

Pseudocode:

set upper limit
make a prime list

loop over n from 2 to upper limit

    set "still_prime" flag to True

    loop on p from 2 to n-1

        is n divisible by p?
            yes: set "still_prime" flag to False
    is "still_prime" flag True?
        yes: append n to prime list
output primes found

This is the most intuitive understanding of the concept, how about implementing it in Python. Note I'm not going to use Numpy, although I could squeeze some improvements in, but maybe lose some in other places as well! This program only has time for measurement and uses % the modulus function for efficient computing.

Simple Prime Number Sieve

import time

upper_limit = 10001
primes_found = []

time_start = time.time()
# iterate from 2 to limit
for n in range(2,upper_limit+1):

    still_prime = True
    # iterate from 2 to number
    for p in range(2,n):
        # if the number is divisible, not prime.
        if n%p == 0:

            still_prime = False

    if still_prime:

        primes_found.append(n)

time_end = time.time()

print( "Total number of primes found:", len(primes_found) )
print( "Primes found:" )
#print last 10 primes
print( primes_found[-10:] )
print("Run time:", time_end-time_start)
Total number of primes found: 1229
Primes found:
[9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
Run time: 5.7099528312683105

It works, it is correct, but at the cost of 5.71 seconds on my machine! Why invest in a nice gaming computer in the first place?! Sheesh. I got problems to solve on Project Euler and I am sure as hell not going to wait 5.71 seconds just to find that read the problem wrong. We. can. do. better.

We have two for loops iterating across large numbers... the statement if n%p = 0: still_prime = False really only needs to be called if the 'still_prime' condition is still true.

Modification #1: Check if previous number was prime before testing

upper_limit = 10001
primes_found = []

time_start = time.time()

for n in range(2,upper_limit+1):

    still_prime = True

    for p in range(2,n):
        #Added this condition
        if still_prime:

            if n%p == 0:

                still_prime = False

    if still_prime:

        primes_found.append(n)


time_end = time.time()

print( "Total number of primes found:", len(primes_found) )
print( "Primes found:" )
#print last 10 primes
print( primes_found[-10:] )
print("Run time:", time_end-time_start)
    Total number of primes found: 1229
    Primes found:
    [9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
    Run time: 3.2463064193725586

A substaintial reduction of computing time, from 5.70 to 3.25 seconds, while continuing to perform correctly. But upon further investigation, there is more room for improvement. In fact, the most apparent truth of primes is regarding what they can be composed of. Let's say \(n\) is 100. 100 = \(10*10\). That means that if I want to see if a number is prime less than 100, I only have to check to the square root of 100, 10!

Brilliant.

Modification #2: Teset only to \(\sqrt n\)

upper_limit = 10001
primes_found = []

time_start = time.time()

for n in range(2,upper_limit+1):

    # set sqrt(n)
    sqrt_of_n = n**0.5
    # set "checked all" flag to False
    checked_all = False

    still_prime = True

    for p in range(2,n):
        # add the 'not checked_all' to the condition statement
        if still_prime and not checked_all:

            if p > sqrt_of_n:
                # check if you have covered the numbers to sqrt(n) 
                checked_all = True

            elif n%p == 0:

                still_prime = False

    if still_prime:
        primes_found.append(n)


time_end = time.time()

print( "Total number of primes found:", len(primes_found) )
print( "Primes found:" )
#print last 10 primes
print( primes_found[-10:] )
print("Run time:", time_end-time_start)
Total number of primes found: 1229
Primes found:
[9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
Run time: 2.696723461151123

Next, now that we are computing to \(\sqrt{n}\), we can also realize that we don't need to check p from 2 to n. We only need to check to the primes found for our second for loop.

Modification #3: Test from current primes_list, not to n.

upper_limit = 10001
primes_found = []

time_start = time.time()

for n in range(2,upper_limit+1):

    sqrt_of_n = n**0.5
    checked_all = False
    still_prime = True

    #changed range from 2,n to primes_found
    for p in primes_found:

        if still_prime and not checked_all:

            if p > sqrt_of_n:

                checked_all = True

            elif n%p == 0:

                still_prime = False


    if still_prime:

        primes_found.append(n)

time_end = time.time()
print( "Total number of primes found:", len(primes_found) )
print( "Primes found:" )
#print last 10 primes
print( primes_found[-10:] )
print("Run time:", time_end-time_start)
Total number of primes found: 1229
Primes found:
[9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
Run time: 0.25299906730651855

The time reduced from 2.80 to 0.25 seconds just by iterating over the list and not n. That is over an order of magnitude improvement! Now the last major improvement... While the inside for loop change has dramatically sped up the code, we are still continuing to iterate over numbers when the 'still_prime' and 'checked_all' flags are checked... This isn't efficient as we have already met our condition statements once those happen. We should exit the loop after we have searched all possible choices.

Modification #4 Exit loop if non-prime is found or no more divisors exist.

upper_limit = 10001
primes_found = []

time_start = time.time()

for n in range(2,upper_limit+1):

    sqrt_of_n = n**0.5
    checked_all = False
    still_prime = True

    for p in primes_found:

        if still_prime and not checked_all:

            if p > sqrt_of_n:

                checked_all = True
            elif n%p == 0:
                still_prime = False

        # else!
        else:
            # exit from the loop
            break

    if still_prime:

        primes_found.append(n)

time_end = time.time()

print( "Total number of primes found:", len(primes_found) )
print( "Primes found:" )
#print last 10 primes
print( primes_found[-10:] )
print("Run time:", time_end-time_start)
Total number of primes found: 1229
Primes found:
[9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
Run time: 0.01798725128173828

Amazing, another order of magnitude of computation time reduction all because of exiting a loop. Recall that the original algorithm time was 5.71 seconds, now reduced to 0.018 seconds. This is done on my semi-poweful desktop, and other individuals might have different results, but the improvements are real!

This excercise makes it very clear to see how even the simplest of computer code can quickly turn ugly and inefficient! All through the better understanding of the mathematics behind the problem and the engineering techniques to help execute.

*the internet has entered the chat *

While the method that we have developed is certainly fast and we have made some great improvements, it is not the fastest by any means. I think the more valuable lesson is the subtle modification of code to extract over two orders of magnitude of performance increase! Upon finalizing this post, I decided to see what the rest of the internet community had to share on 'fastest prime sieves'. I was not disappointed when I came across this stackoverflow post. Now the input limits are constrained to N>=6 the succinctness and application of very non-intuitive instructions makes this code lightning fast. This is the code I am using now for all of my Project Euler problems now!

def primesbelow(N):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    #""" Input N>=6, Returns a list of primes, 2 <= p < N """
    correction = N % 6 > 1
    N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
    sieve = [True] * (N // 3)
    sieve[0] = False
    for i in range(int(N ** .5) // 3 + 1):
        if sieve[i]:
            k = (3 * i + 1) | 1
            sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
            sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
    return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
upper_limit = 10001
primes_found = []


time_start = time.time()
primes_found = primesbelow(upper_limit)
time_end = time.time()

print( "Total number of primes found:", len(primes_found) )
print( "Primes found:" )
#print last 10 primes
print( primes_found[-10:] )
print("Run time:", time_end-time_start)
Total number of primes found: 1229
Primes found:
[9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
Run time: 0.0010001659393310547

0.001 seconds for 10001 numbers. Another order of magnitude reduced due to some very clever arrangement of what is important in the prime number problem. This is the great value of the community of nerds. It is worth mentioning that none of the algorithms are parallel processing, quantum computing, next-gen architecture. But we can still continue to search for the most efficient methods and hopefully apply them to daily life in meaningful ways. For example, the UMAP method that created the background of my homepage where the prime number list is projected to a 2D space to show some fascinating geometrical structures that have yet to be fully understood!

As always, I'm hoping to improve on this work. Thanks for reading!

Consider this problem solved.

QED


Comments

comments powered by Disqus