One-to-one tutoring is widely considered the gold standard for personalized education, yet it remains prohibitively expensive to scale. To evaluate whether generative AI might help expand access to this resource, we conducted an exploratory randomized controlled trial (RCT) with $N = 165$ students across five UK secondary schools. We integrated LearnLM -- a generative AI model fine-tuned for pedagogy -- into chat-based tutoring sessions on the Eedi mathematics platform. In the RCT, expert tutors directly supervised LearnLM, with the remit to revise each message it drafted until they would be satisfied sending it themselves. LearnLM proved to be a reliable source of pedagogical instruction, with supervising tutors approving 76.4% of its drafted messages making zero or minimal edits (i.e., changing only one or two characters). This translated into effective tutoring support: students guided by LearnLM performed at least as well as students chatting with human tutors on each learning outcome we measured. In fact, students who received support from LearnLM were 5.5 percentage points more likely to solve novel problems on subsequent topics (with a success rate of 66.2%) than those who received tutoring from human tutors alone (rate of 60.7%). In interviews, tutors highlighted LearnLM's strength at drafting Socratic questions that encouraged deeper reflection from students, with multiple tutors even reporting that they learned new pedagogical practices from the model. Overall, our results suggest that pedagogically fine-tuned AI tutoring systems may play a promising role in delivering effective, individualized learning support at scale.
翻译:一对一辅导被广泛视为个性化教育的黄金标准,但其规模化成本仍然过高。为评估生成式人工智能是否有助于扩大这一资源的可及性,我们在英国五所中学的 $N = 165$ 名学生中开展了一项探索性随机对照试验。我们将 LearnLM——一个为教学法微调的生成式AI模型——集成到Eedi数学平台的基于聊天的辅导课程中。在试验中,专家导师直接监督LearnLM,其职责是修改模型起草的每一条消息,直至他们满意并愿意亲自发送。LearnLM被证明是教学指导的可靠来源,监督导师对其起草消息的批准率为76.4%,且无需或仅需极少编辑(即仅改动一两个字符)。这转化为了有效的辅导支持:在我们测量的每一项学习成果上,由LearnLM指导的学生表现至少与接受人类导师辅导的学生相当。事实上,与仅接受人类导师辅导的学生(成功率为60.7%)相比,获得LearnLM支持的学生在后续主题的新问题上解决率高出5.5个百分点(成功率达66.2%)。在访谈中,导师们强调了LearnLM在起草苏格拉底式问题方面的优势,这类问题能促进学生进行更深层次的思考,多位导师甚至表示他们从该模型中学到了新的教学实践方法。总体而言,我们的研究结果表明,经过教学法微调的AI辅导系统可能在规模化提供有效的个性化学习支持方面发挥重要作用。