I’ve been recently toying with the idea of using music as a format of exploratory data analysis. While the use of sound to monitor data isn’t new, its still relatively uncommon. As I occasionally find myself trying to make sense of large data sets finding a way to quickly analyse them, to find the points of interest can be quite tedious. So I thought about ways someone with no music skill could generate sound from data, and produce something relatively melodic, and useful for highlighting patterns and anomalies in the data.

Sing a song of software,

To test this out I put together a little tune, that covers the past 24 years of stock information from Microsoft (Piano), Apple(Clarinet) and Google(Xylophone). The pitch is proportional to the price of the stock with low tones being low prices and high be high. While the volume of each instrument is proportional to the volume of sales over the period, so when you hear a quiet sound that is a low volume day, while loud note is a period of higher trading volume.

There are two versions of the music available:

A shorter 2 minute, up-tempo version using weekly stock prices: OGG, Midi – This one is short and to the point, but some of the nuances, like big daily trade spikes are missed.

A longer, 17 minute, version using daily prices: OGG, Midi – This one is a little monotonous at the start, but you can hear Apple come from a tiny instrument in the background to a larger force much better. It also lets you hear some off Apples big trading days.

Bubbles full of lies;

A few things to listen out for:

  • Early on, listen for Microsoft’s speedy accent during the 2000’s tech boom, and an even quicker decline. (About 1:00 in on the quicker version)
  • Apple, has for a long time a consistently low trade volume, however occasionally you will hear loud piano strikes starting from the early 2000’s. (About x minutes in.) These are peaks of stock sale, probably around MacWorld and iPod/Phone/Pad announcements.
  • After about 2005, you can hear Apple and Google slowly rise in volume and stock price, while Microsoft remains in a consistent range throughout the same period. (After 2:00 in the short version)

4 and 20 years of stocks,

The data that all of this was pulled from was the historical stock prices data sets available on Google Finance. Why 24 years worth – because it fit with the theme of the nursery rhyme I was trying to mimic. Its pretty touch and go as to what data you can download from Google Finance, but to be fair, from my understanding this is an issue with the exchanges rather than with Google.

Audio-lised with Py(thon).

So the nitty gritty on how it works:

Its a python script that loops through a set of files of output data from Google Finance and using midiutil creates a Midi file. Each day (or weeks) datapoint is weighted so the values remain within a specific range for a specific instrument and the volumes are adjusted so that each instrument can be detected. Without either of these it really is quite a mish-mash of sound.

This output Midi file is then run through Timidity++ to create an Ogg/Vorbis file. Converting to Ogg is only necessary for consistency, but both the Midi and the Ogg are available.

Future work and ideas

Well the goal is to be able to use a technique like this to listen to large multi-variate datasets, that have either a time dimension, or a continuous dependent variable (heights, weights, etc…). As long as one dimension has values that are relatively evenly and closely distributed with few overlaps and a wide enough spread it should be possible to ‘graph’ probably any dataset meaningfully as audio.