Recently, I blogged about my spring semester statistics course. Although the social bookmarking and pre-class reading assignments in the course are occupying most of my time (largely due to the technical difficulties that emerge when you try to get 75 students to use a new online tool), I’m still getting a lot of mileage out of clicker-facilitated peer instruction during my class sessions.
For instance, last Wednesday we tackled the subject of experimental design. I wanted to make sure students understood the limitations of non-random samples, so I asked a clicker question about an issue I hear a lot of faculty worry about:
Professor X just finished teaching a physics course with 100 students, only 35 of whom completed the course evaluation. For the evaluation item “Give an overall rating of this instructor”, Professor X’s average rating was 2.8 on a scale of 1 (poor) to 5 (excellent). If all of Professor X’s students had completed the course evaluations, his average rating for this item would probably have been…
- Less than 2.8.
- Equal to 2.8.
- Greater than 2.8.
- There’s no way to know given these data.
On the first vote, which was conducted without the students discussing the question with each other, 63% of the students chose the correct answer, “There’s no way to know given these data.” The rest of the votes were split among the other three answers. Since I previewed these results before showing them to the students (by switching the projector from my laptop to the classroom desktop computer), I decided not to show them to the class. This is a classic example of the kind of bar graph that’s likely to inhibit discussion: one answer was clearly more popular than the others, but almost 40% of the students had the wrong answer, so the question warranted further discussion. Had I shown this bar graph to the students, it’s likely that a number of students would have assumed the popular answer was the correct one and thus not engaged as deeply in discussion. (This is still just a theory of mine. I haven’t seen any research on this particular aspect of teaching with clickers. Know of any?)
I asked the students to discuss the question with their neighbors and re-vote. Here are the results of this second vote:
There was some convergence to the correct answer, which indicated to me that the peer instruction time was productive. Since option 4 was the most popular answer, I asked for volunteers who selected that answer to explain their reasoning. The volunteers highlighted the main idea here, that the students who completed the course evaluations were a self-selected (and not random) sample of the population of students who took the course. I had to remind students of the relevant term from the textbook (non-response bias), but the students who volunteered had the concept correct. I can’t say for sure that the volunteers did such a great job because of the peer instruction experience just prior, but I suspect that’s the case.
I could have moved on at this point, but I wanted to make sure the students were clear on the troubles with non-response bias. I asked students to hypothesize situations in which the true population average might be “greater than 2.8.” They were quick to point out that students are more likely to complete a course evaluation for a professor they didn’t like than for a professor they did like. If that was the case here, then the evaluations from the other 65 students might have raised Professor X’s average rating. Then I asked students to come up with a situation in which “less than 2.8” might be the true population mean. One student said something about Professor X have a serious health issue, so that the 35 students who completed the evaluations were giving him higher ratings out of sympathy. I floated the idea that Professor X was really tough and the students who loathed him the most were too busy studying for Professor X’s final exam to complete the course evals. I then tried to connect the dots here by noting that since we don’t know which of these (or other) situations apply here, we can’t make any inferences from this non-random sample.
Note that by asking students to share reasons for “less than 2.8” and “greater than 2.8”, I made it easy for students who had selected those answer choices during the polling to share their perspectives on this question with the entire class. Similarly, I pointed out that if the sample had been random, then “equal to 2.8” would have been a reasonable response, letting students who selected that answer feel somewhat more justified.
In case you’re wondering, that claim that students are more likely to complete course evaluations for professors they dislike than for professors they like? Most of my students agreed with that statement. (To be precise, I didn’t poll them on this point with the clickers, but I saw lots of nodding heads when I asked them about this.) Response rates for course evaluations around here are usually better than 35% (closer to 65%, usually), so the non-response bias is less of an issue, but I’m sure to bring this anecdote up with faculty in the future who suffer from poor response rates on their course evaluations.
Finally, after this discussion of non-response bias, I showed my students a screen capture of RateMyProfessors.com. I got the impression they were already aware that the ratings on that site aren’t necessarily representative, however. Several students claimed that RMP only shows the extremes–ratings from students who really like or really dislike a given professor.
I asked several other very interesting clicker questions in recent days. I’ll try to blog about a few more of them this week.
Image: “Actual Is Not Normal,” Kevin Dooley, Flickr (CC)