Conceptual Populations and Population Means

The clicker question today about the relationship between a sample mean ($$\overline{x}$$) and a population mean ($$\mu$$) was a challenging one. Here’s a bit more context for that question.

Suppose you weigh a rock on a scale several times, each time getting a slightly different reading. Assuming that the physical characteristics of the rock aren’t changing, then these readings can be considered a random sample, taken from a conceptual population consisting of all the readings that the scale could in principle produce.

Finding the sample mean ($$\overline{x}$$) for this sample is straightforward: you add up the readings and divide by the number of readings. However, the population mean ($$\mu$$) isn’t so straightforward. It’s not possible to add up all the readings that the scale could in principle produce, much less divide by the number of such readings.  That’s why I objected to the statement in the clicker question that said that sample means and population means are calculated in the same way. You can’t always “calculate” a population mean.

So what do we mean by “population mean” for conceptual populations? One way to think of it is to define the population mean as the mean of a sample that (somehow, miraculously) follows the population distribution perfectly. However, that requires defining a “population distribution,” which we haven’t done yet. For now, perhaps it’s better to think of the population mean as the expected value of the population.

That rock has, in a sense, a “true” weight. That’s the expected value of the population. We can’t calculate that or even really know what it is. But we can calculate the mean of our sample of readings, and that sample mean is likely to be close to the “true” weight of the rock. Thus the population mean for this conceptual population is this “true” weight, since it’s the expected value for any sample we might take.

Hope that helps, at least a little.

Image: “Dravite,” Craig Elliott, Flickr (CC)

Reading Assignment #3 – Due 1/23/12

In this course, you’ll learn to use the free, open-source statistical software R. Using R directly requires a bit of programming, so we’ll take advantage of the free, open-source program R Studio, which provides a convenient interface to R.

Here’s what you need to do by 8 a.m., Friday, January 20th Monday, January 23rd:

  • Download and install R.
  • Download and install R Studio.
  • Read the first five pages of this introduction to R from our textbook authors. Try out the software as suggested in this introduction.
  • Answer the reading questions below. (Be sure to login to the blog before leaving your answers in the comment section below.)
  • Bring your laptop to class on Friday, if practical. (If you don’t bring one, you can work with a partner who did.)

Here are your reading questions, which you should be able to answer whether or not you successfully install and run these programs:

  1. What purpose does the $ in the command “arbuthnot$boys” serve?
  2. Describe in words the result of the command “arbuthnot$boys/(arbuthnot$boys+arbuthnot$girls)”.
  3. At this point, what do you find most confusing about using R and R Studio?

Social Bookmarking Assignment #2 – Due 1/23/12

For your second social bookmarking assignment, look through your classmates bookmarks from the first assignment (#dataviz) and leave a comment on one of them. Your comment should address this question: What questions about the data does this visualization lead you to ask? Include the keyword “anyqs” (for “any questions?”) somewhere in your comment.

For Diigo users, look for the “Comment” link just under the bookmark in the Diigo group. For Pinterest users, click on the pin and then look for the “Add a comment” box at the bottom of the page.

To get credit for this assignment, complete it by Monday, January 23rd, before class begins.

Reading Assignment #2 – Due 1/18/12

Here’s your next reading assignment. Read Section 1.4 in your textbook and answer the following questions by 8 a.m., Wednesday, January 18th. Be sure to login to the blog before leaving your answers in the comment section below.

  1. Why do you think that there’s only one pie chart in the textbook?
  2. What components of each plot in Figure 1.35 do you find most useful?
  3. What’s one question you have about the reading?

Visualizing Ecological Footprint Data

Thanks for the enthusiasm and creativity you brought to the visualization activity today, and for your willingness to break out your phones during class in an entirely on-topic kind of way. Here are a few of the ecological footprint data visualizations that were submitted that I thought were particularly interesting.

(Click on any of these to see a larger version.)

Here’s the one from Curtis and Hayden we discussed during class. The approach they used with circles is similar to the polar-area diagram, a visualization technique first used by Florence Nightingale (the famous nurse) to describe the kinds of deaths that occurred within the British Army during the Crimean war. Here’s Nightingale’s polar-area diagram:

It makes pretty clear that the Russians weren’t the real enemies during the war. Nightingale’s data visualization led to changes in military field hospital procedures used by the British Army. Read more about Nightingale, who was trained as a mathematician before entering nursing, here and here.

Here’s a visualization I didn’t show on the big screen, this one from Taylor and Lester. As with most of the other visualizations, they focused on the issue of resource deficits. They went with a standard set of coordinate axes, with footprint on the horizontal axis and capacity on the vertical axis. This gives the line y=x particular significance: countries above the line have more capacity than they’re using and countries below the line are running ecological deficits. Population is represented by bubble size, which adds more visual meaning, since more populous countries contribute more to the global ecological footprint.  Albania may be running a deficit, but the United States’ deficit has a bigger impact. Color is used to indicate income group. (Also of note: Taylor had a set of multi-colored Sharpies with her. Very handy.)

Here’s another visualization that uses color to represent income group. This one is from Jack, and he took an interesting approach to the challenge of representing deficits and reserves. Each country’s footprint is represented by thinner, solid-colored bar above the country name. That country’s capacity is represented by the wider, cross-hatched bar that appears “behind” the footprint bar. Clever and easy to read.

I mentioned during class that I didn’t see anyone taking a geographical approach to their visualizations. I clearly didn’t see this visualization by Lauren (who worked with Chris, I think), which takes an abstract approach to geography. The deficit / reserve ratio is pretty clear here, although I’m not sure what the size of each circle means. Is it the number of hectares per capita? Or are the circles scaled by land mass or population, as well?

Here’s a visualization by Kasey and Megan. They tackled the multiple-kinds-of-land-use challenge with stacked bar charts, which works well, and they dealt with their lack of colored Sharpies nicely. Their decision to put biocapacity above a horizontal line and footprint below was one used by several of you. It works, but not great. It makes it a bit hard to tell if a country is running a deficit or surplus.

I’ve saved the best (at least in terms of visual appeal) for last:

This one is by Siana and Tim. Their approach is similar to that of Taylor and Lester above, with coordinate axes for footprint and capacity and bubble size for population. What Siana and Tim have added to this is a couple of different sets of well-designed bubbles. At the top, you’ll see they started with circles (as usual) with country flags for easier identification. I’m guessing that was nice, but not visually interesting enough, so they went with people-shaped bubbles for their second visualization, keeping the country flags in the mix for identification. The people-bubbles don’t add any information to the visualization, but they’re a nice design element and they communicate the idea that size represents population more quickly than the circle-bubbles do.

Finally, here’s the interactive visualization of footprint data that I showed in class today.

Data Visualization Links from 1/11/12

Here are links to the data visualizations we looked at in class yesterday:

And here’s that Hans Rosling talk:

Finally, the Global Footprint Network’s 2010 National Accounts data that I distributed at the end of class is available here.

Reading Assignment #1 – Due 1/13/12

Here’s your first reading assignment. Read Sections 1.1 through 1.3 (omitting Section 1.3.7) in your textbook and answer the following questions by 8 a.m., Friday, January 13th. To answer the questions, login to the blog and leave your answers in a comment. Your comment will only be visible to you, me, and the TAs.

  1. What is the relationship between $$\overline{x}$$ and $$\mu$$?
  2. Is the given variable discrete or continuous? (a) The number of heads in 100 tosses of a coin. (b) The length of a rod randomly chosen from a day’s production. (c) The age of a randomly chosen Vanderbilt student.
  3. When constructing a histogram, why is the choice of bin size (that is, the size of the range of values that are placed in a single bin) important?
  4.  What’s one question you have about the reading?

Update: I just (Thursday at midnight) changed a setting in the “Semi-Private Comments” plugin. The comments section of this post should now work as follows: If you’re logged into the blog and leave a comment, that comment will only be visible to you (anytime you’re logged in) and me (since I’m the admin here). If it doesn’t seem to work that way for you, let me know via email.

Second Update: If you have to give your name and email when trying to leave a comment, then you’re not logged in.

Social Bookmarking Assignment #1 – Due 1/18/12

For your first social bookmarking assignment, find and bookmark an example of data visualization. The more complex the data, the better.

For Diigo users, tag your bookmark with “dataviz.” For Pinterest users, include the hashtag “#dataviz” in your bookmark’s description.

To get credit for this assignment, complete it by Wednesday, January 18th, before class begins.

Image: “Interesting Pin,” Derek Bruff, Flickr (CC)

For Wednesday

I was glad to meet you all today in class. Things got a bit noisy at the end of class, so I thought I’d put your homework for Wednesday in writing:

  1. Read through all the posts here on the blog. Bring any questions you have to class on Wednesday.
  2. Register your clicker on OAK (as described here) if you haven’t already done so.
  3. Set up your social bookmarking account. See these instructions for Diigo and Pinterest. (So far, Diigo’s taken a commanding lead over Pinterest. I’d really like to have 8-10 of you use Pinterest for this. It’s a good match for data visualization, and I’d like to see how the two services match up for academic use.)
  4. Create an account here on the blog (instructions) so you can respond to future pre-class reading quizzes.