Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have primarily studied how to make LLMs follow tutoring principles but not how to model student behavior in dialogues. However, analyzing student dialogue turns can serve as a formative assessment, since open-ended student discourse may indicate their knowledge levels and reveal specific misconceptions. In this work, we present a first attempt at performing knowledge tracing (KT) in tutor-student dialogues. We propose LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn and diagnose whether the student responds correctly to the tutor, and verify the LLM's effectiveness via an expert human evaluation. We then apply a range of KT methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogue KT and outline multiple avenues for future work.
翻译:大型语言模型(LLM)的最新进展推动了人工智能(AI)驱动的辅导聊天机器人的发展,这些机器人在提供广泛获取高质量个性化教育方面展现出潜力。现有研究主要关注如何使LLM遵循辅导原则,而非如何对学生在对话中的行为进行建模。然而,分析学生的对话轮次可以作为一种形成性评估,因为开放式的学生论述可能反映其知识水平并揭示特定的误解。在本研究中,我们首次尝试在导师-学生对话中进行知识追踪(KT)。我们提出了LLM提示方法,用于识别每个对话轮次所涉及的知识组件/技能,并诊断学生对导师的回应是否正确,同时通过专家人工评估验证了LLM的有效性。随后,我们在生成的标注数据上应用了一系列KT方法,以追踪学生在整个对话过程中的知识水平。我们在两个辅导对话数据集上进行了实验,结果表明一种新颖而简单的基于LLM的方法——LLMKT——在预测对话中学生回答正确性方面显著优于现有KT方法。我们进行了广泛的定性分析,以凸显对话KT中的挑战,并展望了未来工作的多个方向。