Working through the Project Euler problems, and with the comment from Arcane Sentiment about filtering based off of a set of primes, I was trying to program the Sieve of Eratosthenes. I also realized that the Sieve becomes a set based problem, so I went about trying to solve it using sets.

My first approach was simply using the difference function given in clojure.set.

(defn primes-under [n] (loop [sieve (set (cons 2 (range 3 n 2))) f 3] (if (> (square f) n) sieve (recur (difference sieve (range (square f) n f)) (inc f)))))

I start with the set of numbers starting by cons-ing 2 onto the range from 3 to n, skiping every other number, to get the odd numbers up to n.

The new sieve becomes the the difference between the current sieve, and the multiples of the factor f. I use (range (square f) n f) as a subset of the factors of f up to the number n. I start the range at the square of the factor, as any multiples below the square of the factor would already have been sieved out from the factors below it. So for multiples of 5, I can ignore (10, 15, 20) as they are multiples of 2 and 3, the other primes below 5, and as a result I only need to get multiples of 5 starting at 25.

When timing this solution by using:

(time (dorun (primes-under 1000000)))

The timing for the primes under 1,000,000 is around 2.1 seconds, and anywhere from 6 to 10 seconds for the primes under 2,000,000.

I wasn’t happy with this performance, especially after seeing Christophe Grand’s solution and timing on his blog post Everybody loves the Sieve of Eratosthenes.

I decided to try my hand at solving this using a transient set. After many missteps in using a transient set, due to not having used them before, I came up with the following solution:

(defn primes-under2 [n] (let [sieve (transient (set (cons 2 (range 3 n 2))))] (loop [s sieve f 3] (cond (> (square f) n) (persistent! s) :else (recur (reduce disj! s (range (square f) n f)) (inc f))))))

I define sieve as a transient set, and instead of using difference, I call reduce on the sieve using the function disj! for the ranges of numbers to filter out. I need to use disj! to remove that number from the sieve, as this is now a transient set I am working against.

The timing of this solution is now down to 1.1 seconds for primes under 1,000,000, and primarily in the 3 second range for the primes under 2,000,000.

I am still not quite satisfied with this solution, and would love suggestions as to how to avoid those numbers which have been filtered out of the sieve already. As I just do a (inc f), I would still get the numbers 4, 6, 8, 9, 25, etc, that are multiples of other primes, but I could not figure out how to skip these numbers. I tried the functions contains?, get, superset?, subset?, difference, but these were not giving me the result when dealing with the transient set.

If someone could tell me if I have missed anything I should try for filtering out non-prime numbers as factors, I would love to get your suggestions to try it out, and I will post a follow up with the results.

–Proctor

#1 by Steven Edwards on June 13, 2012 - 04:47

Great observation! Since the set is transient, there’s no need to include it in the loop/recur block. e.g.,

—

(defn primes-under3 [n]

(let [sieve (transient (set (cons 2 (range 3 n 2))))]

(loop [f 3]

(cond (> (square f) n) (persistent! sieve)

:else (do (reduce disj! sieve (range (square f) n f)) (recur (inc f)))))))

—

That shaves 10-20% off of running it on my laptop. (4.1s vs 4.8s best case, 6.0s vs 7.0s worst case, 5.0s vs 5.5s average case)

#2 by Stig Brautaset (@stigbra) on February 2, 2014 - 16:58

Where’s the square function from? I was benchmarking this vs some other implementations and since this didn’t compile because of the missing square function I declared it as `(defn square [x] (* x 2))`. However, in my benchmarking it is awfully slow compared to Christophe’s solution while you got comparable numbers – so I’m wondering whether I’ve done something weird…

#3 by Stig Brautaset (@stigbra) on February 2, 2014 - 17:01

My apologies… I must have screwed up my benchmarks. I see now just ~1.5x difference.