More from Week 2 of “The History and Future of (Mostly) Higher Education“…
In one of the Week 2 videos, Cathy Davidson offers a critique of what she sometimes calls “standardized testing” and sometimes “multiple-choice testing.” She discusses Frederick J. Kelly, who, in 1914, “came up with a method where anybody could grade a test,” that is, the multiple-choice test. One test is written and distributed to multiple students, and student work is graded by anyone with an answer key–or even by a machine. Boom. Efficiency. And at a time when, thanks to a federal law requiring schooling through age 16 as well as high rates of immigration, there were more students than ever in the public school system.
There’s a lot about the standardized, multiple-choice tests used in US education that is worthy of critique. However, I find that Davidson’s frequent conflation of the terms “standardized” and “multiple-choice” obscures some important aspects of this topic. For instance, in the forums, she writes that these standardized tests are “easier to machine grade, which is why Frederich Kelly designed standardized testing in 1914 in the first place.” Actually, what made Kelly’s tests easier to machine grade was the fact that they were multiple choice, not that they were standardized. Standardization, in my mind at least, refers to the use of a particular set of learning objectives over multiple classes–across a school or academic program or state or nation.
Multiple-choice tests, since they’re easy to grade, make standardization (across any scale) more practical, and standardization is often accomplished via multiple-choice tests, which is why the two ideas are often conflated, not just by Davidson, but by many others. But you can have standardization without multiple-choice tests (see, for instance, how essays are graded on AP exams or how an academic department uses a standard rubric in multiple courses to assess program-level learning outcomes), and you can have multiple-choice without standardization–which happens every time a teacher writes his or her own multiple-choice test, responsive to the learning objectives particular to a given course.
Multiple-choice tests have their limitations, of course. They’re no good at assessing student creative output, and, assuming they’re written by an instructor and not the students themselves, they might not assess learning outcomes valuable to students that aren’t also valuable to instructors. But within those limitations, they can be very effective ways to assess student learning. “Multiple-choice” is often conflated not only with “standardized” but also with “factual recall.” Yes, multiple-choice exams can be used to test factual recall, but they can also be used to assess understanding, application, analysis, and (if you’re really clever) evaluation.
The hard part of using a multiple-choice assessment is in writing questions and answer choices that accurately address your learning objectives, just as the hard part of using a free-response question is in evaluating evidence of student learning in the responses they provide. This bring us back to Frederick Kelly. If there’s a scarcity of expert graders, then free-response questions aren’t viable at scale. Multiple-choice questions, on the other hand, work just fine when expert graders are scarce…. assuming you’re okay with standardization (using the same test across multiple classes and cohorts of students). That way, you don’t need a lot of experts, just enough to write a good multiple-choice exam. This allows you to overcome the scarce grader problem, but at a cost. The “one test to rule them all” might not be a great match to the learning objectives at play in every classroom. And that’s the objection I hear most from K12 teachers about standardized testing.
Why is all this attention to terminology important? Because standardized tests have their strengths and weakness, and those strengths and weaknesses are related to, but distinct from the strengths and weaknesses of multiple-choice tests. If we are to be savvy about the kinds of assessments we use, we need to understand these distinctions so we can balance strengths and weaknesses intentionally. For instance, does Cathy Davidson object to the use of standardized, multiple-choice tests because they don’t allow students freedom in expressing their learning (a quality of multiple-choice tests) or because they impose a common set of learning objectives across multiple contexts (a quality of standardized tests)? Maybe both, but some clarity on these issues would be helpful, I think.
I’ll add that even a standardized, multiple-choice test can be very useful. See, for instance, the Force Concept Inventory, a test that covers first-semester physics concepts, where each wrong answer is based on research about commonly held misconceptions. It functions as a widely accepted measurement instrument for comparing the effectiveness of first-semester physics instruction across multiple institutional contexts. As bad as Frederick Kelly’s standardized, multiple-choice test might have been in 1914, the FCI stands an example of a highly valuable standardized, multiple-choice test in 2014.
Image: “final exam,” dcjohn, Flickr (CC)