I just noticed that visual.ly has been running a health data visualization challenge in recent weeks. You have until Sunday, July 22nd, to submit visualizations of data of your choice from HealthData.gov. Both individual and group entries are invited. The prizes for first, second, and third place are pretty sweet, including books, technology, and conference travel.
Check back here soon for a study guide for the final exam, as well as solutions for the second midterm. Let me know if you have any questions about the final or if there are other materials I can provide that would be helpful.
I wanted to share a few ideas for creating infographics with you as you start work on your application projects.
Option 1 – Use visual.ly, an online tool for creating infographics. I haven’t used it, but I’ve heard great things about it, and I’d love to learn how to use it. I think it’s basically free.
Option 2 – Use R. I’m pretty comfortable troubleshooting in R if you need help. See below for some resources on using R to create data visualizations. You can export your R plots as PNG or PDF files, then insert them into a PowerPoint or Prezi you use to design the infographic as a whole.
Option 3 – Use Excel. I’m pretty comfortable troubleshooting in Excel, too. You would need to export these plots into PowerPoint or Prezi, just like with R.
Option 4 – Use some other program, like Adobe Illustrator or the free, open-source alternative to Illustrator, Inkscape. I don’t know these tools, however, so I probably won’t be able to help you if you get stuck. See this list of tools from Steven Anderson and these ideas for creating infographics from Chris Clark for other possibilities.
Whatever tool you use, be sure you can output an infographic that I can embed one way or another here on the blog. Let me know if you have questions about output options.
Help with R
Flowing Data has some useful tutorials for creating particular data visualizations in R:
The heatmap tutorial includes a section on color selection. R has a few built-in color palettes, including cm (cyan-magenta) and heat (warm colors), both seen in the tutorial. Other palettes include rainbow, terrain, and topo. You can type things like “?rainbow” in R to find out the parameters for these palettes.
See the Quick-R tutorials on basic graphs and advanced graphs for more R commands useful for data visualization. I used the Flowing Data and Quick-R tutorials, along with R’s help files and a little Internet searching, to create the visualizations back on Problem Set 1.
Here’s the Prezi I used during class today to introduce the topic of linear regression. You can step through the Prezi by clicking the forward button, or you can use your mouse to pan and zoom freely throughout the canvas.
One more point about the definition of the line of best fit: We floated a few options today for determining lines of best fit. One them–the line that minimizes the sum of squares of the vertical distances between the points and the line–is the standard method, and, as we’ll see, it has a variety of benefits. However, one could pick any of the other methods and see where it leads. That’s the kind of thing mathematicians do: tweak one little thing or change an assumption and see what the logical consequences are. It’s possible that in some applications, one of those other methods of determining lines of best fit would be more useful than the standard method.