Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have studied how to make LLMs follow tutoring principles, but have not studied broader uses of LLMs for supporting tutoring. Up until now, tracing student knowledge and analyzing misconceptions has been difficult and time-consuming to implement for open-ended dialogue tutoring. In this work, we investigate whether LLMs can be supportive of this task: we first use LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn, i.e., a tutor utterance posing a task or a student utterance that responds to it. We also evaluate whether the student responds correctly to the tutor and verify the LLM's accuracy using human expert annotations. We then apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogueKT and outline multiple avenues for future work.
翻译:近年来,大型语言模型(LLMs)的进展推动了人工智能(AI)驱动的辅导聊天机器人的发展,这些系统在提供广泛获取高质量个性化教育方面展现出潜力。现有研究主要关注如何使LLMs遵循辅导原则,但尚未深入探索LLMs在支持辅导方面的更广泛用途。迄今为止,在开放式对话辅导中追踪学生知识并分析误解一直难以实施且耗时费力。本研究探讨了LLMs是否能够支持此项任务:我们首先采用LLM提示方法识别每个对话轮次(包括教师提出任务的表述或学生对此的回应)所涉及的知识组件/技能。同时评估学生是否正确回应教师,并借助专家人工标注验证LLM的准确性。随后,我们在标注数据上应用多种知识追踪(KT)方法,以追踪整个对话过程中学生的知识水平。我们在两个辅导对话数据集上开展实验,结果表明一种新颖而简单的基于LLM的方法——LLMKT——在预测对话中学生回答正确性方面显著优于现有KT方法。我们进行了深入的定性分析,以揭示对话知识追踪面临的挑战,并展望未来研究的多个方向。