Generative artificial intelligence (AI) has the potential to scale up personalized tutoring through large language models (LLMs). Recent AI tutors are adapted for the tutoring task by training or prompting LLMs to follow effective pedagogical principles, though they are not trained to maximize student learning throughout the course of a dialogue. Therefore, they may engage with students in a suboptimal way. We address this limitation by introducing an approach to train LLMs to generate tutor utterances that maximize the likelihood of student correctness, while still encouraging the model to follow good pedagogical practice. Specifically, we generate a set of candidate tutor utterances and score them using (1) an LLM-based student model to predict the chance of correct student responses and (2) a pedagogical rubric evaluated by GPT-4o. We then use the resulting data to train an open-source LLM, Llama 3.1 8B, using direct preference optimization. We show that tutor utterances generated by our model lead to significantly higher chances of correct student responses while maintaining the pedagogical quality of GPT-4o. We also conduct qualitative analyses and a human evaluation to demonstrate that our model generates high quality tutor utterances.
翻译:生成式人工智能(AI)具备通过大语言模型(LLM)规模化实现个性化辅导的潜力。当前的AI导师通常通过训练或提示LLM遵循有效教学原则来适配辅导任务,但并未针对整个对话过程中最大化学生学习效果进行专门训练。因此,其与学生的互动方式可能存在优化空间。为突破此局限,我们提出一种训练方法,使LLM生成的导师话语既能最大化学生回答正确的概率,同时仍遵循优良教学实践。具体而言,我们生成一组候选导师话语,并通过双重机制进行评分:(1)基于LLM的学生模型预测学生正确回答的概率;(2)由GPT-4o评估的教学准则符合度。随后利用所得数据,通过直接偏好优化方法训练开源模型Llama 3.1 8B。实验表明,本模型生成的导师话语能显著提升学生正确回答的概率,同时保持与GPT-4o相当的教学质量。我们还通过定性分析和人工评估验证了本模型生成高质量导师话语的能力。