This is a visualisation I made for UPSingapore‘s “Data in the City” hackathon.
It shows a dynamic visualisation of 24 hours of mobile call data. The colours change to indicate the volume of mobile calls being made in each location throughout the course of the day.
The visualisation code is here and some details of how it was made follow below.
The raw data came from 34 million rows of mobile call data, generously provided by Singtel for the purposes of the hackathon. This was about 1.5 GB of data. Strategy for loading this was:
- Pre-build a vector of 96 vectorz-clj matrices, each matrix containing 80×100 double values. In effect, we have a big 96*80*100 array. Each matrix is going to hold the number of calls made for a 15-minute period
- Create a lazy sequence of lines from the source data .csv file. We need to use a lazy sequence because of the size of the file: it’s too big to fit efficiently in working memory, but fortunately the combination of lazy sequences and the JVM’s automatic garbage collection will take care of that problem for us, providing we don’t hold onto the head of the sequence.
- Iterate over all the lines in the file, and parse each one to find the timestamp and geospatial location of the call. We use this to index into the correct location in our 96x80x100 data array, and increment the call count for that location. We use a mutable operation to modify the matrix.
The entire processing of the data set takes about a minute or so – which is pretty impressive given the number of records, it means we’re doing several hundred thousand rows per second. I suspect this could be even faster with some custom parsing.
Overall design for the visualisation was as follows:
- Loop continuously over each of the 96 matrices, which each represent a “frame” of the animated visualisation
- For each frame, construct an image from the corresponding call volume matrix
- Display the image, and add the extra graphical details (the time clock and the Singapore coastal outline)
The image construction is perhaps the most interesting part. I ended up writing a short custom renderer to do this:
;; define colour for each number of calls (defn col ^long [^double val] (let [lval (* (Math/log10 (+ 1.0 val)) 0.9)] (cond (<= lval 0.0) 0xFF000000 (<= lval 1.0) (let [v (- lval 0.0)] (col/rgb 0.0 0.0 v)) (<= lval 2.0) (let [v (- lval 1.0)] (col/rgb v 0.0 (- 1.0 v))) (<= lval 3.0) (let [v (- lval 2.0)] (col/rgb 1.0 v 0.0)) (<= lval 4.0) (let [v (- lval 3.0)] (col/rgb 1.0 1.0 v)) :else 0xFFFFFFFFF))) ;; create an image frame from a matrix (defn city-image ^BufferedImage [^AMatrix data] (let [^BufferedImage bi (img/new-image GW GH)] (dotimes [y GH] (dotimes [x GW] (.setRGB bi (int x) (int y) (unchecked-int (col (.get data (int y) (int x))))))) bi))
A few tricks used here:
- The custom col function creates the desired colour from the volume of calls. I used log(1+number of calls) to drive this – this is a good choice because it compresses the range of the data values into a reasonably small numbers, and made the assignment of colours pretty simple.
- Some direct Java interop to call the .setRGB method on the BufferedImage.
- Primitive type hinting / casts to ensure the code runs fast, as we need this to render in realtime.
Finally, the image needed a bit of post-processing to add the graphical extras.
(defn frame "Renders a specific frame of the visualization" ([i] (let [bi (img/zoom 8.0 (city-image (data i))) g (.getGraphics bi)] (.drawImage g outline (int 0) (int 0) nil) (.drawString g (str (text/pad-left (str (quot i 4)) 2 \0) ":" (text/pad-left (str (* 15 (mod i 4))) 2 \0)) (int 5) (int 20)) bi)))
I used my Imagez library to provide the image zooming function – this hasthe nice effect of doing smooth colour interpolation when scaling up an image, which avoids the potential problem of “blocky” pixellated colours in the final visualisation.
I was quite pleased with the overall experience of using Clojure and core.matrix to create this kind of visualisation.
Some things I think could do with a bit more polishing / library work:
- Making visualisation fast still needs quite a bit of Java interop. I think we could do some more with macro magic here to enable efficient visualisation code in idiomatic Clojure, but it’s a tricky one: it’s hard to abstract away from the underlying details while still ensuring efficient code.
- core.matrix needs some more IO routines to make it easy to save / load grids of data in readable formats. Nothing too complex, I’ll probably add these in an upcoming core.matrix release.
- I’d quite like a small library of generic visualisation techniques (simple general purpose stuff like handling and displaying images, animation loops etc.). Incanter has a lot of good stuff as does Nurokit, but both are quite heavy dependencies to pull in. If I get some free time I may pull something like this together on GitHub.