AI辅导能够安全有效地支持学生：一项在英国课堂中的探索性随机对照试验 (AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms)

LearnLM Team, Eedi, :,Albert Wang,Aliya Rysbek,Andrea Huber,Anjali Nambiar,Anna Kenolty,Ben Caulfield,Beth Lilley-Draper,Bibi Groot,Brian Veprek,Chelsea Burdett,Claire Willis,Craig Barton,Digory Smith,George Mu,Harriet Walters,Irina Jurenka,Iris Hulls,James Stalley-Moores,Jonathan Caton,Julia Wilkowski,Kaiz Alarakyia,Kevin R. McKee,Liam McCafferty,Lucy Dalton,Markus Kunesch,Pauline Malubay,Rachel Kidson,Rich Wells,Sam Wheeler,Sara Wiltberger,Shakir Mohamed,Simon Woodhead,Vasco Brazão

One-to-one tutoring is widely considered the gold standard for personalized education, yet it remains prohibitively expensive to scale. To evaluate whether generative AI might help expand access to this resource, we conducted an exploratory randomized controlled trial (RCT) with $N = 165$ students across five UK secondary schools. We integrated LearnLM -- a generative AI model fine-tuned for pedagogy -- into chat-based tutoring sessions on the Eedi mathematics platform. In the RCT, expert tutors directly supervised LearnLM, with the remit to revise each message it drafted until they would be satisfied sending it themselves. LearnLM proved to be a reliable source of pedagogical instruction, with supervising tutors approving 76.4% of its drafted messages making zero or minimal edits (i.e., changing only one or two characters). This translated into effective tutoring support: students guided by LearnLM performed at least as well as students chatting with human tutors on each learning outcome we measured. In fact, students who received support from LearnLM were 5.5 percentage points more likely to solve novel problems on subsequent topics (with a success rate of 66.2%) than those who received tutoring from human tutors alone (rate of 60.7%). In interviews, tutors highlighted LearnLM's strength at drafting Socratic questions that encouraged deeper reflection from students, with multiple tutors even reporting that they learned new pedagogical practices from the model. Overall, our results suggest that pedagogically fine-tuned AI tutoring systems may play a promising role in delivering effective, individualized learning support at scale.

翻译：一对一辅导被广泛视为个性化教育的黄金标准，但其规模化成本仍然过高。为评估生成式人工智能是否有助于扩大这一资源的可及性，我们在英国五所中学的 $N = 165$ 名学生中开展了一项探索性随机对照试验。我们将 LearnLM——一个为教学法微调的生成式AI模型——集成到Eedi数学平台的基于聊天的辅导课程中。在试验中，专家导师直接监督LearnLM，其职责是修改模型起草的每一条消息，直至他们满意并愿意亲自发送。LearnLM被证明是教学指导的可靠来源，监督导师对其起草消息的批准率为76.4%，且无需或仅需极少编辑（即仅改动一两个字符）。这转化为了有效的辅导支持：在我们测量的每一项学习成果上，由LearnLM指导的学生表现至少与接受人类导师辅导的学生相当。事实上，与仅接受人类导师辅导的学生（成功率为60.7%）相比，获得LearnLM支持的学生在后续主题的新问题上解决率高出5.5个百分点（成功率达66.2%）。在访谈中，导师们强调了LearnLM在起草苏格拉底式问题方面的优势，这类问题能促进学生进行更深层次的思考，多位导师甚至表示他们从该模型中学到了新的教学实践方法。总体而言，我们的研究结果表明，经过教学法微调的AI辅导系统可能在规模化提供有效的个性化学习支持方面发挥重要作用。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AI教育的落地深度研究：复盘、对比和商业化

专知会员服务

16+阅读 · 2025年4月3日

【AI4Science】利用大型语言模型变革科学：关于人工智能辅助科学发现、实验、内容生成与评估的调研

专知会员服务

32+阅读 · 2025年2月10日

如何做好AI研究？哈佛大学Pranav教授《AI研究经验》手册，259页pdf

专知会员服务

54+阅读 · 2025年1月5日

联合国教科文组织发布《生成式AI与教育未来》应用指南，48页pdf

专知会员服务

49+阅读 · 2023年9月13日