25 Aug 2014
D3 stands for Data Driven Documents (DDD) and it simplifies the process of building visualizations on top of data, by handling most of the math and boilerplate necessary to generate visual elements.
In this post, we’ll discuss the basic aspects of D3.js, heavily based on the excellent book from Scott Murray, Interactive Data Visualization for the Web. The book is a relatively short and fun read, and it was compiled from a series of tutorials Scott wrote in the past.
D3.js is an open-source library and it’s available on github. We can clone that repository and use them in our code. The source code is spread out into multiple files in
d3/src/ but they are compiled into
/d3/d3.js using a node.js module called smash, also by @mbostock.
The basic template for embedding D3 in a web page is by following this template:
We can do all the testing using our local host or use jsfiddle.
D3 handles DOM manipulation very neatly. For example, one of the first things we’ll do when writing a D3 code is to select the body of our html page:
From there we can perform other DOM operations like adding other DOM elements,
The selection uses CSS3 selectors syntax, so we can also select elements by class
".myClassName" or id
Multi-selection. One key component of D3 expressiveness is batching operations. This saves us from writing for loops and it makes the code more concise. Say we have an HTML body like:
We can access all paragraphs by multi-selecting all
<p> tags within the body:
This will set all of the contents of the paragraph to “Hello World”. In most of cases we’ll want to define a callback instead of passing a constant string to handle each entry differently. For example, we could do:
Observe how it relies a lot on function chaining. For it to work, it depends on the compatibility of the return type and the next method call, so it can be fragile. The API is very well crafted though, and it usually behaves as we’d expect. Moreover, it makes the code much more legible, removing keyword boilerplates and intermediate variables.
One of the most import operation in D3 is binding data to DOM elements. This is done via the
data() method. For example, we could do:
First, we are selecting all existing
p elements, then we’re binding the data. The method
enter() contains the rows from data that are not in the current selection. More specifically, say
selectAll() returned 2 existing
p elements, that is, an array with index 0 and 1. Our data is an array of 4 elements, with index 0 to 3.
D3 will assume that the indexes 0 and 1 are already there, so it’s not binding the values 3 and 5. We have the 8 and 13 values “unbounded”, so it will append one p element for each of these values and set the text.
It’s possible to specify the keys of the entries in data, but the default key is the array index. So let’s create keys for each of our entries:
Now we can verify all 4 paragraphs are rendered in addition to the 2 existing ones. For more details, see .
Let’s create a simple random column chart. We’ll use SVG elements to render the columns. First, we can start creating and set the dimensions of a SVG element using the
Then, we can generate one
rect element per entry of our data. In the code below, note how we set attributes in batch, by calling
attr() with a list of attribute names and values.
Running the above with 25 random points renders a simple column chart:
Scale is essentially a function, that is, it maps a set of input to another output. One example would be if our data had X values ranging from
40-100, but our chart had width
1200px, and we wanted to map the range
[0,1200]. The most natural way to map a continuous interval onto another is through a linear transformation. We could write a function to perform that for us, but D3 makes it very easy to setup such mapping:
In this syntax, domain is the input and range is the output.
Scales are important for axis, because axis are essentially visual representations of scales. Creating a simple axis from a scale is simple:
This will place a x-axis at the top of the chart. As we know, the x-axis is commonly positioned at the bottom of the chart, so we need to perform a y-translation of
This will cause the axis to not be shown because it got displaced beyond the SVG element limits. One way around that is to account for an extra height when defining the SVG height:
Another important aspect in data visualization is the interactiveness of the data.
In D3 we can set event listeners on SVG elements through the method on. It takes an event name (examples include “click”, “mouseover”, “mouseout”). A simple example is setting an event listener on the rectangles of our column chart. Let’s color it orange on hover:
One observation here is that this within the callback function passed to the on method, is bound to the SVG element on which we’re setting up the listener.
The result can be seen on this jsfiddle.
Constructing a column/bar chart is relatively straightforward using regular SVG rectangles and the D3 axis helper functions. On the other hand, chart types like pie charts for example, involves working with radians and more complicated math.
To leverage this, D3 uses the concept of layouts. One of the layouts is the pie layout:
It is basically a function that can transform our regular data into a suitable format for rendering SVG arcs, which will represent the slices of our piechart.
The code below creates binds a generic group element to each element of our dataset. It also translates our pie chart because all values are calculated taking the center of the circle as the origin (0, 0).
Now we can append the actual wedge (represented by the SVG arc element), which can be easily created with the
We can use the
d3.scale.category10() for generating a set of up to 10 distinct colors for each slice.
Some magic seems to be going on here. Nowhere we set the start and end angles of our slice. I had to dig into the source code to realize that arc doesn’t represent the actual arc, but an arc generator. Then we set the attribute d, we’re actually calling a function arc() and the data is passed to this function. The
endAngle properties are being set by the pie layout.
Doing some other tweaks like adding the labels leads to the following pie chart:
GeoJSON is a JSON for describing maps in terms of SVG elements. For example, for a US map, each state has it’s own entry in this JSON and they define a set of coordinates that when project become a polygon defining the boundary of the state.
This GeoJSON is usually big, so it makes sense loading them from a file. We can start by doing
This file has no actual data, so we need to join with some other file, for example with a CSV file containing state names and some metric, like agricultural productivity (as in Chapter 12 of ). So after we have our map info loaded, we can also load the real data:
And before rendering we merge the data into the geoJSON:
Now we’re ready to generate the SVG elements:
The only missing piece here is the color, which maps values from the data into a discrete set of values:
The complete code with additional data added as circles can be seen on github.
D3.js is a very neat library and fun to work with. I’ve learned a lot about D3 and SVG writing this post and also became aware of the effort in standardizing computational cartography (GeoJSON). I’m super excited to try more examples, building stuff on my own and possibly contribute to the project.
My research in grad school was related to proportional symbol maps, and I was surprised that one of the examples consisted in actually constructing a proportional symbol maps with circles.