Continuing a series of posts on Dan Roam’s The Back of the Napkin…
While Roam’s definition of map is fairly broad, his use of the term multivariable plot is very specific. He uses this term to refer to bubble charts. Here’s an example of a bubble chart, expertly presented by Hans Rosling of Gapminder:
Seriously, watch the video. You won’t regret it. If you’re in a hurry, the real data visualization starts just before 5 minutes in.
Did you see how many variables Rosling was able to include all at once in his bubble chart?
- Life expectancy – vertical axis
- Income per person – horizontal axis
- Population – bubble size
- Region – bubble color
- Time – animation
That’s four quantitative variables and one categorical variable, all wrapped up in a single chart. Okay, so it’s animated. But if you take out time as a variable, you still have a great visualization of a four-dimensional data set.
In Roam’s examples, he doesn’t have the option of animation. (It’s a printed-on-dead-trees book, after all.) So to represent change over time, he uses color (okay, shades of grey–it’s a black and white book). This year’s data shows up as white bubbles. Next year’s projected data shows up as grey bubbles. This means Roam sacrifices his categorical variable to represent a couple of different values of the time variable. But again, that’s a four-dimensional data set in one convenient chart.
Why represent data using these multivariable plots? Did you watch the video? These plots can often surface complex relationships and stories that might otherwise be hidden in the data. In the video, Rosling uses an animated bubble chart to convey those stories, but if you’re the one trying to make sense of multivariable data, plots like these might help you discover those stories.
Here’s another example from some of my Vanderbilt colleagues, Stella Flores (Public Policy & Higher Education) and Jacob Thornton (Geographic Information Systems):
This is something of a map, of course, but the story told by the data comes through in the bubbles in this image. The bigger the bubble, the more educated the state is. The lighter blue the state is, the lower the poverty rate in that state. Check out Texas: we can see very quickly that things are grim there. In contrast, check out Wyoming: well educated and well off. But take a look at South Dakota just next door. Two neighboring states with very different education and poverty levels. I wonder why that is?
Roam provides a few tips for constructing multivariable plots, including a suggestion to start simple and add in variables until you have three or four represented. He notes that too few variables results in a simple bar chart, which is useful in some contexts but less useful for answering the “Why?” question that drives multivariable plots. Too many variables and you’ll lose yourself. Roam also provides the classic correlation-is-not-causation warning, which is particularly relevant when you go look for stories in multivariable data.
Want to create your own multivariable plots, complete with animation? Try these instructions from Gapminder for using Google’s Motion Chart gadget.
The next time you have a multivariable data set to share with students, consider using a bubble chart of some kind. Or, turn the data over to your students and have them construct the chart and use it to analyze the data. As Roam notes, these charts require some patience to construct, but…
Of the six frameworks and hundreds of picture types out there, a well-thought-through and clearly drawn multiple-variable plot is the most powerful and insightful we can create.