Data-Driven Disappointment Part 3: Dealing with data
In part 3 of this series on using the D3 JavaScript library (d3.js), we will look at using the functionality provided by D3 to group our data, then use that grouped data as the basis for a simple pie chart showing the relative sizes (proportions) of the groups. We will look at a couple common D3 patterns, including select-data-join and creating and calling generated functions.
Note: if you’re just coming to this series or need a refresher on the story so far, why not check out the first post or the previous post (part 2)?
The starting point
Recall that we have a TSV (tab-separated values) formatted data string in the variable rawData
(provided below)
Show data string
const rawData = `Role LaidOff Functional Hierarchy
Chief Executive Officer FALSE 4
Chief Strategy & People Officer FALSE 3
Director, Fund Development FALSE FUND 2
Director, Partnerships & Program Facilitation FALSE 2
Director of Marketing and Communications FALSE MARCOM 2
Director of Finances & Accounting TRUE FIN 2
Director, Programs FALSE 2
K-12 Program Manager TRUE 1
Adult Program Manager FALSE 1
Manager, Chapters FALSE 1
Senior Manager, Partnerships FALSE 1
Senior Manager, Program Facilitation FALSE 1
Senior Project Manager TRUE 1
Sr Manager Evaluation & Impact Measurement TRUE EVAL 1
Senior People and Culture Manager FALSE HR 1
Senior Marketing Manager FALSE MARCOM 1
Instructional Training Manager FALSE 1
Senior Fund Development Specialist TRUE 0
Sr. Learning Experience Designer TRUE 0
Senior Partner Development Lead TRUE 0
Senior Learning Facilitator TRUE 0
Senior Learning Facilitator TRUE 0
Senior Learning Facilitator TRUE 0
Senior Fund Development Lead FALSE FUND 0
Senior Fund Development Lead FALSE FUND 0
Partnership Development TRUE 0
Lead, Teen Ambassador Program (TAP) TRUE 0
Senior Partnership Development Lead TRUE 0
Senior Bilingual Learning Facilitator TRUE 0
Senior Bilingual Learning Facilitator TRUE 0
Bilingual learning Facilitator TRUE 0
Learning Experience Designer, Adult Programs TRUE 0
Learning Facilitator TRUE 0
Learning Facilitator TRUE 0
Learning Facilitator TRUE 0
Learning Facilitator TRUE 0
Bilingual, Partnership Development TRUE 0
Accountant FALSE FIN 0
People and Culture Coordinator TRUE HR 0
Data Analyst TRUE EVAL 0
Partnerships Coordinator TRUE 0
Marketing Coordinator FALSE MARCOM 0`
And the following starter/setup code (with the data loading integrated):
<!DOCTYPE html>
<script type="module">
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";
// your code goes here
const rawData = `...`; // put the actual data string here
const data = d3.tsvParse(rawData);
const svgWidth = 400;
const svgHeight = 400;
const d3svg = d3.create("svg")
.attr("width", svgWidth)
.attr("height", svgHeight)
.attr("viewBox", [ -svgWidth / 2, -svgHeight / 2, svgWidth, svgHeight ])
;
</script>
<div id="d3-container">
<!-- this will hold whatever D3 generates -->
</div>
Which boils down to:
import
everything fromd3
.- Parse the data out of the tsv string.
- Setup the
<svg>
element that will be the canvas for the chart. Here,svgWidth
andsvgHeight
are used as variables of convenience. - Use
d3.create("svg")
to create an<svg>
element. - Assign values using the
attr()
method to the<svg>
element’swidth
andheight
attributes, which determine the size of the<svg>
element. - Assign a value to the
viewBox
attribute1. This takes an array of 4 values corresponding tomin-x
,min-Y
,width
, andheight
.min-x
andmin-y
are the lowest values thatx
andy
can take inside the visible area of the<svg>
element, i.e. they correspond to the top-left corner. In essence, this is defining the coordinate system within the<svg>
element.
If you’re relatively new to JavaScript, this code (especially for the d3.create()
method) might look a little odd, but method chaining (whereby a method is immediately executed using the result of the previous method) is a major/common pattern in d3
. You could also write it on a single line (as below) but I think it’s easier to understand what’s going on and what’s being called when it’s separated out as above.
const d3svg = d3.create("svg").attr("width", svgWidth).attr("height", svgHeight).attr("viewBox", [ -svgWidth / 2, -svgHeight / 2, svgWidth, svgHeight ]);
Grouping the data
Right now our data is stored as an array of objects. To be able to represent it as a pie chart, we need some way to represent it as single, graphable values. Take the question,
“How many people were laid off from Canada Learning Code in January of 2024?”
What we would need from our data are counts of roles where LaidOff="TRUE"
and LaidOff="FALSE"
. If you’re familiar SQL, you might be thinking about the GROUP BY
clause, which you can use to summarize data by the values of a column. We can achieve something similar using d3.group()
:
// d3.group( dataSet, property )
const groupedData = d3.group(data, d => d.LaidOff);
This separates the data in our data
variable into two groups, based on the values of the property, LaidOff
, specified by the function provided as the second argument2 of d3.group
: "TRUE"
and "FALSE"
.
We can get a sense of the structure of groupedData
using the spread operator ...
or Array.from()
3:
console.log(...groupedData); // or console.log(Array.from(groupedData))
// [
// [ "TRUE", [ // all data where LaidOff === "TRUE" ] ],
// [ "FALSE",[ // all data where LaidOff === "FALSE" ] ]
// ]
The size (length
) of the arrays contained in each group would be the number of data points in each group (i.e., the number of roles/positions belonging to each group). The equivalent SQL might look someting like:
SELECT LaidOff, COUNT(*) FROM data GROUP BY LaidOff
;
Generating the pie (and) slices with d3.pie()
and d3.arc()
👆 The endpoint for this section. See the code:
<script type="module">
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";
const rawData = `Role LaidOff Functional Hierarchy
Chief Executive Officer FALSE 4
Chief Strategy & People Officer FALSE 3
Director, Fund Development FALSE FUND 2
Director, Partnerships & Program Facilitation FALSE 2
Director of Marketing and Communications FALSE MARCOM 2
Director of Finances & Accounting TRUE FIN 2
Director, Programs FALSE 2
K-12 Program Manager TRUE 1
Adult Program Manager FALSE 1
Manager, Chapters FALSE 1
Senior Manager, Partnerships FALSE 1
Senior Manager, Program Facilitation FALSE 1
Senior Project Manager TRUE 1
Sr Manager Evaluation & Impact Measurement TRUE EVAL 1
Senior People and Culture Manager FALSE HR 1
Senior Marketing Manager FALSE MARCOM 1
Instructional Training Manager FALSE 1
Senior Fund Development Specialist TRUE 0
Sr. Learning Experience Designer TRUE 0
Senior Partner Development Lead TRUE 0
Senior Learning Facilitator TRUE 0
Senior Learning Facilitator TRUE 0
Senior Learning Facilitator TRUE 0
Senior Fund Development Lead FALSE FUND 0
Senior Fund Development Lead FALSE FUND 0
Partnership Development TRUE 0
Lead, Teen Ambassador Program (TAP) TRUE 0
Senior Partnership Development Lead TRUE 0
Senior Bilingual Learning Facilitator TRUE 0
Senior Bilingual Learning Facilitator TRUE 0
Bilingual learning Facilitator TRUE 0
Learning Experience Designer, Adult Programs TRUE 0
Learning Facilitator TRUE 0
Learning Facilitator TRUE 0
Learning Facilitator TRUE 0
Learning Facilitator TRUE 0
Bilingual, Partnership Development TRUE 0
Accountant FALSE FIN 0
People and Culture Coordinator TRUE HR 0
Data Analyst TRUE EVAL 0
Partnerships Coordinator TRUE 0
Marketing Coordinator FALSE MARCOM 0`;
const data = d3.tsvParse(rawData);
const groupedData = d3.group(data, d => d.LaidOff);
const svgWidth = 300;
const svgHeight = 300;
const d3svg = d3.create("svg")
.attr("width", svgWidth)
.attr("height", svgHeight)
.attr("viewBox", [-svgWidth / 2, -svgHeight / 2, svgWidth, svgHeight])
.style("background-color", "lightblue")
;
d3svg.append("g")
.selectAll("path")
.data(
d3.pie().value(d => d[1].length)(groupedData)
)
.join("path")
.attr("fill", "white")
.attr("stroke", "black")
.attr("d", d3.arc().innerRadius(50).outerRadius(100))
.append("title")
.text(d => d.data[1].length)
;
d3.select("#d3-container-chart-1").node()
.append(d3svg.node())
;
</script>
<div id="d3-container-chart-1">
<!-- this will hold whatever D3 generates -->
</div>
We’re almost there. This is where we encounter some d3
specific functionality in the form of data binding (with the data()
method) and data joining (with the join()
) method. Now we need to do the following:
- Use
d3.pie()
to generate the start and stop angles for each slice, based on the size of the groups in our data. - Create and append an SVG group element,
<g>
, to act as a container for the pie slices. - Bind the generated pie slice data to
<path>
elements in the group that we just created. - Set the
<path>
element’sfill
andstroke
colors by way of theattr()
method. - Use
d3.arc()
to populate the<path>
element’sd
attribute for the path, which determines how the<path>
is drawn. - Append a
<title>
element to each<path>
that describes the data that the pie slice is representing. - Finally, append the SVG to the container
<div>
.
<!DOCTYPE html>
<script type="module">
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";
// your code goes here
const rawData = `...`; // put the actual data string here
const data = d3.tsvParse(rawData);
const groupedData = d3.group(data, d => d.LaidOff);
const svgWidth = 400;
const svgHeight = 400;
const d3svg = d3.create("svg")
.attr("width", svgWidth)
.attr("height", svgHeight)
.attr("viewBox", [ -svgWidth / 2, -svgHeight / 2, svgWidth, svgHeight ])
;
d3svg.append("g")
.selectAll("path")
.data(
d3.pie().value(d => d[1].length)(groupedData)
)
.join("path")
.attr("fill", "white")
.attr("stroke", "black")
.attr("d", d3.arc().innerRadius(50).outerRadius(100))
.append("title")
.text(d => d.data[1].length)
;
d3.select("#d3-container").node()
.append(d3svg.node())
;
</script>
<div id="d3-container">
<!-- this will hold whatever D3 generates -->
</div>
Binding and joining data
There’s a bit to unpack in that code. First is the d3
pattern of
selection.selectAll().data().join();
which is how some of the future d3
magic works: it makes less sense right now since there are no existing elements that match selectAll("path")
, but if there were, this is how d3
would be able to select chart elements and update them with new data based on .data().join("path")
.
Generating pie slice start and end angles
Next is how data is being provided to the data()
method: d3.pie()
returns a function that can be used with a dataset to generate the appropriate values for startAngle
and endAngle
. The function created by d3.pie()
returns an array of objects based on the input data with the following structure4:
// these are generic values, but the structure is what's important
[
{
"data": 1, // the original data point. in our case, this would be an object.
"value": 1, // the value derived from the original data point
"index": 6,
"startAngle": 6.050474740247008, // angle in RADIANS
"endAngle": 6.166830023713296, // angle in RADIANS
"padAngle": 0 // gap between slices also in RADIANS
},
// ...
]
As for using/creating/calling d3.pie()
, the two approaches below are equivalent:
// create and call the function on one line
d3.pie().value(d => d[1].length)(groupedData);
// ----- pie() function -------|-- argument --|
// or
// create the function and assign it to a variable
const pieGenerator = d3.pie().value(d => d[1].length);
// call the function using that same variable
pieGenerator(groupedData);
We also need to instruct d3
on how to extract (access) the necessary value
from the data being provided, the sum of which determine the proportion of the pie that the slice should take. We can use an arrow function (=>
) for this purpose.
Recall that our groupedData
Map can be accessed as an array of arrays, i.e.:
[
[ "TRUE", [ /* datapoints as objects in this array */ ] ],
[ "FALSE", [ /* datapoints as objects in this array */ ] ]
]
For each group:
d[0]
is the group label,"TRUE"
or"FALSE"
d[1]
is the datapoint array, the length of which is how many points are in the array
So, d => d[1].length
, as a count of the datapoints belonging to a group, could be used as the value
that would determine a group’s slice of the pie.
Using d3.arc()
to generate paths
The shape of a <path>
element is set by the code found in the value of the d
attribute. The code for each slice of a pie chart can be generated by the d3-shape
helper function, d3.arc()
, which itself returns a function.
selection.join("path")
.attr("fill", "white")
.attr("stroke", "black")
.attr("d", d3.arc().innerRadius(50).outerRadius(100))
Like the function created by d3.pie()
, the function created by d3.arc()
needs to be provided data to be able to generate the code for the <path>
elements. We can specify inner and outer radii by chaining the methods innerRadius()
and outerRadius()
onto d3.arc()
(if the innerRadius()
is greater than 0, the created d3.arc()
function will generate a donut shape). When using this function, if we don’t explicitly specify what data to use, it will implicitly use the data that has already been bound– which is actually a good thing in this case since that contains both a startAngle
and an endAngle
.
That might have seemed like a lot of work to come up with a kind of boring visualization and you wouldn’t be wrong. Like many coding endeavours, there’s more than a little bit of setup to take care of however once that’s out of the way, that’s when things can really take off! Continue on to the next post, Data-Driven Disappointment Part 4: Colour and title text, where we’ll spice up our simple pie chart with some text and colours.
-
More detail about the
viewBox
attribute from MDN: https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/viewBox ↩ -
Many of the methods in
d3
are looking for some sort of function to define how to make sense of the data that’s been provided– of the properties available for each, which should be used to be able to answer the question that we’ve asked? ↩ -
d3.group()
produces a Map object, so usingconsole.log()
directly on the result will just log an empty object{}
to the console. Learn more in the MDN Web Docs entry on Maps: MDN Web Docs: Map. ↩ -
d3.pie()
documentation (https://d3js.org/d3-shape/pie#_pie) ↩