Clojure performs exceptionally well considering it is a dynamic language.
My rule of thumb is:
- It’s 5-10x slower than optimised Java if you write idiomatic Clojure (with liberal use of higher order functions, lazy sequences and all that goodness)
- It’s about the same speed as Java if you really optimise hard – you can basically generate approximately the same JVM bytecode.
This StackOverflow question prompted me to write a quicksort implementation in Clojure, which I’ve included here to demonstrate that matching Java speed is entirely possible within Clojure.
Warning against premature optimisation: this is not pretty code. You shouldn’t be writing code like this unless you have profiled your code and identified a real performance issue.
So here’s the code:
(set! *unchecked-math* true) (defmacro swap [a i j] `(let [a# ~a i# ~i j# ~j t# (aget a# i#)] (aset a# i# (aget a# j#)) (aset a# j# t#))) (defmacro apartition [a pivot i j] `(let [pivot# ~pivot] (loop [i# ~i j# ~j] (if (<= i# j#) (let [v# (aget ~a i#)] (if (< v# pivot#) (recur (inc i#) j#) (do (when (< i# j#) (aset ~a i# (aget ~a j#)) (aset ~a j# v#)) (recur i# (dec j#))))) i#)))) (defn qsort ([^longs a] (qsort a 0 (alength a))) ([^longs a ^long lo ^long hi] (let [lo (int lo) hi (int hi)] (when (< (inc lo) hi) (let [pivot (aget a lo) split (dec (apartition a pivot (inc lo) (dec hi)))] (when (> split lo) (swap a lo split)) (qsort a lo split) (qsort a (inc split) hi))) a)))
The key tricks are:
- Do everything with primitive arithmetic
- Use ints for the array indexes (this avoids some unnecessary casts, not a big deal but every little helps....)
- Use macros rather than functions to break up the code (avoids function call overhead and parameter boxing)
- Use loop/recur for maximum speed in the inner loop (i.e. partitioning the subarray)
- Avoid constructing any new objects on the heap (so avoid vectors, sequences, maps etc.)
Finally the benchmark code and test runs. Note that when testing this kind of code you want to run it a few times to get a "steady state" timing which is free of JVM and possibly processor cache warm-up effects (as you can see the first couple of runs are noticeably slower):
(dotimes [i 10] (let [ys (long-array xs)] (time (qsort ys)))) "Elapsed time: 17.392009 msecs" "Elapsed time: 13.020082 msecs" "Elapsed time: 11.264932 msecs" "Elapsed time: 11.974766 msecs" "Elapsed time: 11.805373 msecs" "Elapsed time: 11.925635 msecs" "Elapsed time: 11.799507 msecs" "Elapsed time: 11.567418 msecs" "Elapsed time: 11.297931 msecs" "Elapsed time: 11.367228 msecs"
dnolen created an improved version of the quicksort that uses functions rather than macros. This is much nicer code - thanks dnolen for providing something much more beautiful! The macro version above is still marginally faster (on my machine at least) but the difference isn't big enough to justify the extra code complexity that this brings.
For anyone reading this post in the future, I'd recommend studying this version before resorting to macros.