Three Recent Studies on Student Learning with Generative AI

“a college student studying at a desk surrounded by multiple robot tutors, in the styles of a 1950s comic book,” via Midjourney

Next week I’m speaking on a panel organized by Colleen Flaherty for an Inside Higher Ed webinar titled “AI and Student Learning: What We Know (and What We Don’t).” It’s scheduled for Wednesday, April 10th, at 1pm Central, and it’s free! I’m excited to share my perspectives on this topic and to learn from my co-panelists, who have been engaged in very interesting AI projects.

As I prepare for that panel, I’ve been looking through Lance Eaton’s recent Substack posts about research on AI and teaching and learning. Lance has been using generative AI to “pre-read” recent research in this category. That is, he’s been giving AI some prompts to go and read new studies and report back particular elements of those studies as a way to decide which studies to read in depth. It’s an interesting experimental use of AI, and his posts have pointed me to a few studies I want to highlight here on the blog. Unlike Lance, I won’t be using AI to summarize these studies. Instead, I’ll share my own summaries and observations.

Darvishi, A., Khosravi, H., Sadiq, S., Gašević, D., & Siemens, G. (2024). Impact of AI assistance on student agency. Computers & Education, 210, 104967.

Darvishi and colleagues note that some research indicates that spellcheckers can help correct spelling errors but they don’t do much to teach people how to spell. Is the same true for AI learning assistants?

The researchers designed a study in which students engaged in peer review of student-created learning resources using a platform called RiPPLE. The platform was rigged up to provide specific kinds of AI-generated recommendations to students as they reviewed their peers’ work, including recommendations to make suggestions for improvement, to make comments more relevant to the artifact being reviewed, and to avoid generic comments. These recommendations were made if the AI detected a deficiency in any of these areas as students were composing their reviews.

For four weeks, students participated in the peer review with these AI recommendations. Then students were split into a number of control and treatments groups. Some continued receiving AI recommendations, some didn’t, and some were given self-monitoring checklists (reminding them to do things like provide suggestions for improvement), some weren’t. The study was fairly big, with 1,625 students involved and over 11,000 peer reviews completed by those students.

The big finding was that the “AI prompts played a significant role in maintaining the quality of the students’ feedback.” For example, students who continued to receive the AI assistance had fewer comments flagged for review that students who lost the AI assistance. Another finding that caught my eye was that “the self-monitoring checklists helped students avoid making mistakes and provided them with way to self-regulate their learning.” The researchers inferred that, in part, from the fact that the flag rate for students who received the self-monitoring checklists was lower than the flag rate for students who didn’t.

To frame these results using a metaphor I’ve used before, we see that when someone learning to ride a bicycle has their training wheels removed, they don’t ride as well! If, however, some other form of assistance is provided in place of the training wheels, this effect can be mitigated but not entirely. I welcome your thoughts about scaffolding students’ use of AI in the comments below!

Habib, S., Vogel, T., Anli, X., & Thorne, E. (2024). How does generative artificial intelligence impact student creativity? Journal of Creativity, 34(1), 100072.

This study looks at the use of generative AI to enhance creativity. The authors reference a previous study I had seen in which ChatGPT’s creative potential was evaluated using the Alternative Uses Test (AUT), in which participants are asked to come up with novel uses for everyday objects like bricks and pens and paperclips. That study showed that “on the whole humans currently outperform GPT-3 when it comes to creative output,” but Habib and colleagues wanted to explore what happens when humans use generative AI to help them be creative.

The research involved one hundred students in a “Creative Thinking & Problem Solving” course who took the AUT at the beginning of the course and then again four weeks later using generative AI to help them come up with novel uses for paperclips. I really wish the researchers had used a control group (without AI) and a treatment group (with AI), since the students were presumably learning about creativity during those four weeks. They didn’t, however, but they did collect some interesting reflective comments from the students.

My takeaway from the study is that use of generative AI like ChatGPT to foster creativity, at least measured by novel uses of paperclips, is a mixed bag. On the one hand, some students indicated how useful it was to have ChatGPT help them “broaden the scope of their ideas,” while others felt that it was hard to come up with anything creative after looking at twenty ideas the chatbot just outputted. “These reflections,” the researchers write, “highlight how reliance on AI can result in fixation of thought actually limiting rather than expanding possible ideas.”

Similarly, the study showed that students using generative AI for this creative task came up with a lot more ideas and did so more quickly than they did earlier in the course with the AI, but that they didn’t come up with the kind of out-of-the-box ideas that Sir Ken Robinson references in his famous TED Talk, e.g. asking “Could the paper clip be 200 feet tall and be made out of foam rubber?” Some prompt engineering here could be useful (“Hey, ChatGPT, what could you do with a 200-foot-tall paperclip made of foam rubber?”) but students would have to think of that out-of-the-box prompt to begin with.

One caveat to this study and the earlier one they cite assessing ChatGPT’s creativity is that both used ChatGPT-3, which isn’t the latest and greatest version of generative AI.

Pellas, N. (2023). The Effects of Generative AI Platforms on Undergraduates’ Narrative Intelligence and Writing Self-Efficacy. Education Sciences, 13(11), Article 11.

Where the first study focused on peer review and self-regulation and the second one focused on creative thinking, this research by Pellas looked at students’ narrative intelligence, that is, their understanding of storytelling, and writing self-efficacy. Pellas conducted a study in which education majors were asked to create digital stories, with one group using “traditional” digital storytelling tools like Storybird, Storyjumper, and Mixbook and one group using AI-enhanced digital storytelling tools like Sudowrite, Jasper, and Shortly AI. The study involved pre- and post-assessments using existing instruments for measuring narrative intelligence and writing self-efficacy.

The term “narrative intelligence” was a new one to me, but I’m glad to know Pellas had both an operational definition (involving such things as emplotment, characterization, narration, generation, and thematization) and a validated instrument for measuring it, the Narrative Intelligence Scale (NIS). Similarly, writing self-efficacy was assessed using the Situated Academic Writing Self-Efficacy Scale (SAWSES) which involves the factors writing essentials, relational reflective writing, and creative identity. (You know the instruments are solid when they’re referred to primarily by their initials!)

Not unlike the Darvishi study, this one found that students using the AI-enhanced storytelling tools showed higher levels of narrative intelligence, but Pellas notes that “long-term effects and whether the observed improvements are sustained over time were not explored.” Darvishi’s study would indicate that perhaps these improvements would not be sustained with the removal of the AI tools, unless some other scaffolding was provided to students.

Measures of writing self-efficacy were also higher in the experimental group, but not across the board. In particular, on the creative identity component of the SAWSES, there was no significant difference between use of the “traditional” storytelling platforms and the AI-enhanced platforms. That would seem consistent with the Habib et al. study above in which use of generative AI for creativity was a mixed bag.

Since ChatGPT was only released to the public 16 months ago, we are still in the early days of educational research on the use of generative AI in teaching and learning. These three studies, however, are getting into the weeds in useful ways, and I’m glad to see this kind of research being published. We’re not ready to establish best practices in learning with AI, but we’re getting a clearer sense of the complexities of learning with these tools.

Leave a Reply

Your email address will not be published. Required fields are marked *