Reference: James, Mark C., Barbieri, F., & Garcia, P. (2008). What are they talking about? Lessons learned from a study of peer instruction. Astronomy Education Review, 7(1).
Summary: This study is a follow-up to James (2006), a study comparing the effect on student discourse during clicker questions of grading incentive. In the earlier study, student conversations were audio-recorded in two different astronomy course, each taught by different instructors and each involving different grading schemes for clicker questions. The main finding of the earlier study was that a low-stakes grading scheme (one in which incorrect answers counted as much as correct answers) encouraged richer student-to-student discussions prior to voting on clicker questions than a high-stakes grading scheme (one in which incorrect answers counted only a third as much as correct answers).
A key drawback to James’ earlier study was that the two courses being compared were different in significant ways other than the grading scheme used for clicker questions. In particular, they had different topics and different instructors. That drawback has been mostly eliminated in the current study by James, Barbieri, and Garcia. In this study, the same instructor taught the same course, an introduction to astronomy course with about 180 students, in two consecutive semesters. In both semesters, clicker questions contributed 12.5% of the students’ overall course grades. In the first semester, incorrect answers counted one-third as much as correct answers, but in the second semester, incorrect answers counted 90% as much as correct answers.
The instructor used a version of the standard peer instruction technique. Students were not asked to vote on clicker questions independently, but were instead asked to discuss the questions in pairs prior to voting. Random samples of students in each semester were audio-recorded during these pair discussions throughout the two semesters. The audio-recordings were analyzed in two different ways to measure “discourse bias,” “the difference between the fractional contributions to a conversation between partners.” For instance, if one partner contributed 70% of the time and the other contributed 30% of the time, then the pair’s discourse bias would be 40%.
First, each idea shared by the students during the discussions was coded according to ten categories, including categories such as restating question elements, stating answer preferences, and providing justifications. (One side finding was that there was no correlation between “type” of clicker question and the nature of the ideas shared by students during the discussion.) Second, the total number of words produced by each student during the discussions was counted. Both techniques provided measured of each student’s contribution to the discussions.
The results strongly indicated that the low-stakes grading scheme encouraged more balanced participation by students during pair discussions. For example, when using the first measure of discourse bias (counting ideas), the average bias for the high-stakes class was 33.2%, whereas the average bias for the low-stakes class was 19.5%. That is to say that each pair of students engaged in a conversation more dominated by one of the students in the high-stakes class. The second measure of discourse bias (counting words) provided similar results-an average bias of 39.8% in the high-stakes class and 26.6% in the low-stakes class.
The authors also note that the low-stakes grading scheme promoted more independent student responses to clicker questions following pair discussions. In the high-stakes class, only 7.6% of the time did two partners submit different answers to clicker questions, whereas in the low-stakes class, this occurred 17.1% of the time. The authors conclude from this that in the high-stakes class, students’ concern for earning points motivated them to submit their partners’ answers to clicker questions even when they didn’t really believe those answers.
Comments: This study improves on James’ earlier study and provides persuasive evidence that low-stakes grading schemes for clicker questions promotes more meaningful student participation in small group discussions prior to voting. True, this wasn’t a double-blind, randomized control group experiment (in which students were randomly assigned to the two grading schemes and the instructor didn’t know which grading scheme would be used with each group of students), but such experiments are practically impossible to implement in educational settings. Short of that “gold standard,” this is a very well-designed and persuasive study, in part because many of the possibly confounding variables in the earlier study were eliminated and in part because of the use of direct, qualitative measures of student participation.
Willoughby and Gustafson (2009) conducted a similar study in physics courses, audio-recording student discussions in some sections and not in others. They found that students in the sections that were not audio-recorded “block-voted” more when high-stakes grading schemes were used for clicker questions and less when low-stakes grading schemes were used. In the sections where audio-recorders were used, there was no statistically significant difference in block-voting rates. They concluded that the presence of the audio-recorders might have influenced student voting behaviors (an example of the Hawthorne effect). If true, it’s possible that the difference James, Barbieri, and Garcia found in block-voting behavior might have been even greater had audio-recorders not been used-all the more reason to use low-stakes grading schemes when using clickers for formative assessment.
I would have liked to have seen a little additional information provided in the article about the use of clicker questions in these courses. Were students asked to vote on their own before discussion the clicker questions in pairs? (I don’t think they were, but this isn’t stated in the article.) Also, what instructions were given to students prior to the peer discussion times? I’ve seen some evidence (Lucas, 2009) and heard some advice (from Doug Duncan) that the instructions given to students prior to peer instruction can affect the quality of the discussions. And while it was found here that the type of clicker question did not correlate with the kinds of ideas shared during peer instruction, it would have been informative to know what kinds of clicker questions were used.
What’s not directly address in this article, however, is the assumption that more student participation during small-group discussion of clicker questions leads to greater student learning. This was an issue I raised in my comments on James’ earlier study, which included data on student performance on final exams. In response to that study I asked if greater class participation led to greater student learning or if students who knew the material better simply dominated class discussions. While it’s possible that the latter is true, evidence from non-clicker studies strongly suggests that more active participation in class discussions leads to greater student learning. I wish this assumption (that participation leads to student learning) had been stated as such in the article.
The takeaway here is that low-stakes grading schemes for clicker questions leads to greater student participation and clicker questions results that more accurately reflect students’ actual understanding (or lack of understanding). These results have important implications for instructors using clickers to motivate student participation and inform agile teaching choices.