Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject experts, the overall pedagogical design still needs to be handcrafted for the best learning results. Based on this principle, we create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer. This approach retains the structure and the pedagogy of traditional tutoring systems that has been developed over the years by learning scientists but brings in additional flexibility of LLM-based approaches. Through a human evaluation study on two datasets based on math word problems, we show that our hybrid approach achieves a better overall tutoring score than an instructed, but otherwise free-form, GPT-4. MWPTutor is completely modular and opens up the scope for the community to improve its performance by improving individual modules or using different teaching strategies that it can follow
翻译:大语言模型(LLMs)在教育领域已有多种应用,从自动题目生成到作文评估。本文探讨了利用大语言模型(LLMs)构建智能导学系统的潜力。LLMs的常见缺陷包括偏离预期教学策略(如向学生泄露答案),且通常缺乏行为保障。我们认为,虽然配备特定防护措施的LLMs可以替代学科专家,但整体教学设计仍需人工精细打磨以达到最佳学习效果。基于这一原则,我们构建了名为MWPTutor的端到端导学原型系统,该系统利用LLMs填充预定义有限状态转换器的状态空间。该方法既保留了学习科学家历经多年研发的传统导学系统的结构与教学法,又融入了基于LLM的灵活性。通过在两组涉及数学应用题的数据集上进行人工评估,我们证明这种混合方法在整体导学评分上优于指令式但无结构约束的GPT-4。MWPTutor完全模块化,为学界通过改进单个模块或采用不同教学策略来提升系统性能开辟了空间。