How Do We Define Quality Student Work in an Age of AI?
José Antonio Bowen and C. Edward Watson have a new book out called Teaching with AI: A Practical Guide to a New Era of Human Learning. I am very impressed at Bowen and Watson and their publisher, Johns Hopkins University Press, at producing such a thoughtful and comprehensive book on this topic so quickly. It’s well-researched and well-written, which isn’t surprising for these two, who have a strong history of collaborations. The book can’t account for changes in AI technology in the last few months, but if you’re looking to survey the landscape of AI, teaching, and learning in higher ed as it was in January or February 2024, Teaching with AI is great way to do that.
Where the book falls a little short is in its lack of examples from actual instructional practice. Bowen and Watson cite a handful of articles and papers written by faculty about their experiences teaching with and about (and against) AI, but it doesn’t appear the authors’ accelerated publication timeline allowed for original interviews with instructors in the trenches figuring out what works and what doesn’t for their particular teaching contexts. That’s understandable, given the goals of the book, but several times I was left wondering if their very thoughtful suggestions for AI-related activities and assignments would play out as expected with real students.
Teaching with AI also takes what I might call a non-directive approach in most of its chapters. That is, Bowen and Watson rarely say that higher education faculty must do this or that. Mostly they surface the issues and questions around AI that faculty should be thinking about, along with lots of suggestions for prompts or assignments or teaching approaches that respond to those issues and questions. This makes a lot of sense given where we are as a teaching community. We’re not in a position yet to define any “best” practices, and even “promising” practices are a little wobbly given how fast the field of generative AI is changing.
That non-directive approach shifts, however, in Bowen and Watson’s chapter on grading. In my workshops and presentations on teaching and AI, I’ve been quoting writing instruction expert John Warner from late 2022: “We may have to get used to not rewarding pro forma work that goes through the motions with passing grades,” given that generative AI can handle that pro forma work with ease. Bowen and Watson take this a step further in the second page of their grading chapter: “It is no longer enough to include, for example, the ability to write effectively as a key learning outcome for your current degree or general education program.” Yikes!
Bowen and Watson nuance that a little in the next line: “We’re going to need to be more careful and specific in how we articulate quality as a dimension of our learning outcomes.” That makes sense. Something as general as “effective writing” isn’t likely to be useful as an outcome in an age of generative AI. We need to be more intentional about what we mean by “effective writing” and point students to those specific skills and dispositions. Writing faculty have known this forever, of course, but now any discipline that works with words (or images, for that matter) needs to have sharper articulations of quality work.
Then, two pages later, Bowen and Watson bring the hammer down. “Educators,” they write, “could now give all of their assignments to AI and peg that work to a C grade.” They note that such work isn’t useless but that it has far less value than it did before ChatGPT. “We should go further,” they continue. “Our aim should be to identify C-level work as unacceptable… Rather than banning AI, let’s just ban all C work.” Bowen and Watson take an older writing rubric of theirs (from their Teaching Naked Techniques book) and give the C column a new label: “AI-Level (50%) = F.” Yes, they’re advocating that if students turn in work of the quality that a ChatGPT or Claude can produce, that work should get a failing grade.
I’m just going to let that sit there for a beat.
I’ve heard faculty like John Warner and computer science educator Brett Becker (in his co-authored white paper) argue that we might need to start changing our learning objectives to de-emphasize the tasks that AI can do well (e.g. write moderately complex code) in favor of other tasks that humans will still need to do in an AI-enhanced world (e.g. read and evaluate the quality of AI- or human-written code), but I think Teaching with AI is the first time I’ve heard someone advocate for such a wholesale change to how we think about and grade quality student work.
Bowen and Watson go on to argue that if we take AI-level quality as a given, “The question as you grade may be: In what ways has the student moved above and beyond what AI produced for them.” They accompany this argument with suggestions (lots of them) for working with students to help them achieve that kind of quality. The following four chapters are all about AI as an aid to learning, and they are full of assignment and activity ideas that leverage AI to help students produce higher-quality work. Seriously, so many ideas. If I were teaching five sections of a writing seminar in the fall, I could probably try out a different idea in each section in each day and still not run out.
Should we recalibrate our expectations for students work as Bowen and Watson suggest right now? I tend to be optimistic about technology’s impact on higher education, but experienced enough to say “No, that’s moving too fast.” MOOCs didn’t upend higher education, nor did Second Life. On the other hand, I haven’t seen a transformative technology weave (insinuate?) its way into so many aspects of academic and professional work as fast as generative AI has done in the last year and a half. So… Maybe?
Here’s one reason to pause: Generative AI can produce pretty decent results on a variety of tasks (writing and otherwise) that students can struggle to accomplish, but we might still want students to get good at those tasks. Write a three-paragraph summary of this journal article that a first-year undergraduate can understand? ChatGPT will do a better job of that than most first-year undergrads will. Suggest useful components and organization for a ten-minute presentation on a potential solution to a community problem? Claude will generate a more comprehensive list of suggestions than many advanced undergraduate students. And yet those are two tasks that many students should be able to handle. Is it helpful to set that kind of work to an “F” on our grading scales?
(Sidebar: If you’re the type of online commenter who describes the output of generative AI as “word soup” or “verbal diarrhea,” you’ve not spent enough time experimenting with these tools. Yes, they can produce great nonsense, but when thoughtfully prompted, they can do very sensible things with words.)
Later in their book, Bowen and Watson cite the work of Donald Norman, who wrote about “cognitive artifacts” like pens and maps and computers that change how humans work with information. They also note that David Krakauer has made a distinction between “complementary” cognitive artifacts, which both amplify and increase our ability to work with information (think number systems), and “competitive” cognitive artifacts, which might help us work well with information but lead to poorer skills when the artifact is removed. Bowen and Watson then remind me how bad I am at navigating without my GPS. They go on to say, “It is hardly clear which type of cognitive artifact AI will turn out to be.”
There’s the rub for education. How do we help students develop the expertise they will need to work with generative AI to produce something of higher than “AI-level” quality? What kind of scaffolding will students need? Might that scaffolding involve having students produce “AI-level” work along the way toward greater expertise?
I’m sidestepping here all the questions about the role of grading that my colleague Josh Eyler is surely raising as he reads this post. But even without those questions, Teaching with AI by Bowen and Watson raises plenty of important questions educators will need to grapple with, and I appreciate their provocative stance on grading in an age of AI as a way to explore that question space.