Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. Specifically, we use GPT-3.5 to evaluate the overall effect of instructional materials on different student groups and find that it can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect. This demonstrates the potential of LMs as reliable evaluators of educational content. Building on this insight, we introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. We apply this approach to create math word problem worksheets aimed at maximizing student learning gains. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences. We conclude by discussing potential divergences between human and LM opinions and the resulting pitfalls of automating instructional design.
翻译:制作高质量教育材料通常需要开展耗时且昂贵的学生学习成效研究。为突破这一瓶颈,学界提出构建学生学习过程的计算模型并用以优化教学材料。然而,学习动态的认知过程建模颇具挑战。本文提出替代性方案:将语言模型(LM)作为教育专家评估不同教学指令对学习成效的影响。具体而言,我们采用GPT-3.5评估教学材料对不同学生群体的整体效应,发现其能复现如"专业反转效应"和"变异性效应"等经典教育发现,证明语言模型作为教育内容可靠评估者的潜力。基于此发现,我们引入教学优化方法:以另一个语言模型的评判结果作为奖励函数,驱动生成教学材料。我们运用该方法创建数学应用题练习册以最大化学生学习收益,人类教师对LM生成练习册的评估显示,语言模型的判断与教师偏好具有显著一致性。最后,本文探讨了人类与语言模型观点可能存在的分歧,以及由此引发的教学设计自动化风险。