spectrograms are fascinating: the ability to visualise sound in terms of its constituent frequencies. I’ve been playing with Overtone lately, so decided to create a mini-library to produce spectrograms from Overtone buffers.
Here’s a sample output:
This particular image is a visualisation of part of a trumpet fanfare. I like it because you can clearly see the punctuation of the different notes, and the range of strong harmonics above the base note. Read on for some more details on how this works.
I’m using Overtone (http://overtone.github.io/) which is an amazing open source audio programming environment. To help with the fast fourier tranmsforms (FFTs) I’m using vectorz-clj (https://github.com/mikera/vectorz-clj). I’m also using the Imagez library (https://github.com/mikera/imagez) for image processing and colour functions.
The source code for my spectrogram experiment can be found here:
Getting The Data
The first thing we need to do is get a sample. There’s a nice feature in Overtone that enables you to download a sample from freesound, so we’ll just use this:
(def samp-buf (load-sample (freesound-path 49477)))
Next we need to transform this into a double array. This gives us a linear array of sound samples that we can process using the Fast Fourier Transform. Double arrays are my go-to format for most intermediate data processing: they have good numerical accuracy, good performance, and are well supported by most of the analytics tools and libraries.
(def arr (into-array Double/TYPE (buffer-read samp-buf)))
Applying the FFT
Now we need to run the FFT on the double array to get the raw spectrogram data. This is a little fiddly, as you need to:
- Break the sample into a series of overlapping “windows”, each with a size that is a power of two.
- Run the FFT on each window. This converts a series of samples over time into an equivalent set of frequencies.
- Copy the FFT results into an output buffer. This is complicated by the fact that the FFT produces complex numbers, so you typically need to compute the magnitude of these for your output
- Advance to the next window and repeat (note that I used a window size of 8192 and a time advance of 1000, so all of the windows are overlapping)
I used the following code to do this:
(defn fft-matrix [^doubles arr] (let [n (count arr) length 8192 ;; length of FFT window half-length (quot length 2) height (min 400 (quot half-length 2)) fft (mikera.matrixx.algo.FFT. (int length)) tarr (double-array (* 2 length)) stride 1000 ts (quot (- n length) stride) result-array (double-array (* height ts))] (dotimes [i ts] (System/arraycopy arr (* i stride) tarr 0 length) (.realForward fft tarr) (dotimes [j height] (aset result-array (+ i (* j ts)) (mag (aget tarr (* j 2)) (aget tarr (inc (* j 2))))))) (Matrix/wrap height ts result-array)))
The result of this code is a Vectorz Matrix that contains the spectrogram data. Within this matrix:
- Each column represents a slice of time (at 1000/44100 second intervals)
- Each row represents a frequency band within the spectrogram
Finally we need to visualise the matrix. This is comparatively simple, the key steps are:
- Create a BufferedImage to store the resulting image
- Loop over each row and column of the spectrogram
- Set the BufferedImage pixel at each location to a colour calculated from the spectrogram value. I used the “heatmap” function from Imagez to get a nice set of colours
(defn render "Renders a spectrogram matrix into a bufferedimage" ([M] (render M (img/new-image (mat/column-count M) (mat/row-count M) ))) ([^AMatrix M ^BufferedImage bi] (let [w (.getWidth bi) h (.getHeight bi)] (dotimes [x w] (dotimes [y h] (.setRGB bi (int x) (- (dec h) (int y)) (unchecked-int (spec/heatmap (* 0.005 (.get M (int y) (int x))))))))) bi))
Putting it together
Finally to display the spectrogram you just need to string the pieces together with the handy “show” function from Imagez.
(img/show (render (fft-matrix arr)))
Performance seems pretty good. Computing the spectrogram and rendering it as above takes just 146ms on my laptop for a 8.5 second sample: i.e. we can reander spectrogram visualisations about 60x faster than the actual duration of the samples on a single core.
There are a few possible directions to extend this little visualisation tool, which I may attempt if I get the time:
- Better signal processing by using window functions (e.g. Hamming windows) – this should get a better spectrogram result
- Realtime visualisation of data – should be easily feasible on a modern machine with this technique. The main challenges would be synchronising the visualisation with continuously updating buffer data
- Reactive feedback – using a realtime spectrogram to drive Overtone sound generation opens some creative interesting possibilities where the visualisation of the sound drives the evolution of the sound in an infinite feedback loop. Both chaotic and stable patterns could emerge….