Clojure performance tip – exploiting primitive casts

Posted: August 6, 2012 in Uncategorized

This is a useful trick for getting much better performance out of numerical code in Clojure.

Suppose you have a function that calculates the length of a 3D vector in Clojure:

(defn length1 [x y z]
  (Math/sqrt (+ (* x x) (* y y) (* z z))))

(time (dotimes [i 1000000] (length1 10 10 10)))
"Elapsed time: 92.773663 msecs"

OK, it’s already pretty fast at 93ns per call. But we can do better. Clojure by default provides support for arbitrary precision arithmetic and a variety of different numeric types. But we are often happy to work with a more restricted range of numbers for performance reasons – 64-bit double-precision numbers are “good enough” for many numerical applications, and calculating the length of a 3D vector is one of these. So we can use primitive casts to tell Clojure that yes, we definitely want to treat our parameters as doubles:

(defn length2 [x y z]
  (let [x (double x)
        y (double y)
        z (double z)]
    (Math/sqrt (+ (* x x) (* y y) (* z z)))))

(time (dotimes [i 1000000] (length2 10 10 10)))
=> "Elapsed time: 11.333821 msecs"

That optimisation paid off very nicely – an 8x speedup in return for three little casts!

What actually happened here?

  • In the first case, the Clojure compiler doesn’t know anything about the types of x, y, and z. The best it can do is pass them to the generic versions of clojure.core/* which take an Object and return an Object. These functions therefore have to examine their arguments to work out the appropriate way of dealing with whatever numerical type they are presented with (which could be a Long, a Double, a BigInteger, or any arbitrary implementation of java.lang.Number)
  • In the second case, we explicitly cast the arguments immediately to primitive doubles. This has a small initial cost, but the fast path is very quick. Subsequently, the compiler knows that the (shadowed) values of x, y and z are primitive doubles. It can then directly use fast double multiplication and addition (which is just as fast as it would be in pure Java code).
Comments
  1. Or how about:

    (defn length3 [^double x ^double y ^double z]
    (Math/sqrt (+ (* x x) (* y y) (* z z))))

  2. endbegin says:

    I am glad Clojure is coming along here, in the sense that a simple type optimization has such a huge impact on performance.

    • mikera7 says:

      I think this is the general direction of the language – dynamic typing by default, but static typing available when you need it (either by explicit tying, or perhaps by type inference through something like Typed Clojure)

  3. “Clojure by default provides support for arbitrary precision arithmetic and a variety of different numeric types.”

    As of Clojure 1.3, the default is performance, not precision, surely? Long and Double by default. You need to specify N or M on literals to get the arbitrary precision. So type hinting is the difference between Object, boxed primitive and raw primitive, depending on what hint you provide.

    • mikera7 says:

      Hi Sean I completely agree – the “default” support I’m referring to is that if you write a function using “+” or “*” then the Clojure compiler will (in the absence of any hints) produce code that checks for arbitrary precision numbers and returns a (potentially arbitrary precision) number. You need to give it enough clues to avoid this behaviour if you want the performance version (which Clojure 1.3+ can certainly provide…)

  4. edorrington says:

    The speedup may actually be even greater than you show. On my Macbook, using 20,000,000 iterations in order to bump the duration up, the optimized version is actually about 50x faster.

  5. mikkom says:

    .. And this is exaclty why Clojure should have type inference.

  6. Hi,

    Thanks for providing these experiments. I was curious about type hinting, so I wrote two more functions and tested them:

    (defn length3 [a b c]
    (let [x ^double a
    y ^double b
    z ^double c]
    (Math/sqrt (+ (* x x) (* y y) (* z z)))))

    (defn length4 [^double x
    ^double y
    ^double z]
    (Math/sqrt (+ (* x x) (* y y) (* z z))))

    Here are my results:

    length1 :
    “Elapsed time: 210.436788 msecs”
    length2 :
    “Elapsed time: 27.204235 msecs”
    length3 :
    “Elapsed time: 36.905093 msecs”
    length4 :
    “Elapsed time: 18.779471 msecs”

    So it is interesting to see that length4 seems to be the fastest of the four candidates (with type hints directly on the input parameters), whereas length3 that has type hints on the local variables inside the function is significantly slower than. Do you have any idea of what happens?

    • mikera7 says:

      I suspect Clojure is inserting unnecessary local variables in length3. You don’t need these if the parameters are already primitive.

      Having said that, I’m a bit surprised that the JIT doesn’t optimise this away. Have you tried using a benchmarking library like Criterium perhaps, which takes care of JIT warmup? You might find they come out the same.

Leave a comment