Accurately assessing student knowledge is central to education. Cognitive Diagnosis (CD) models estimate student proficiency at a fixed point in time, while Knowledge Tracing (KT) methods model evolving knowledge states to predict future performance. However, existing approaches either provide quantitative concept mastery estimates with limited expressivity (CD, probabilistic KT) or prioritize predictive accuracy at the cost of interpretability (deep learning KT). We propose Language Bottleneck Models (LBMs), where an encoder LLM produces textual knowledge state summaries, which a decoder LLM uses to predict future performance. This produces interpretable summaries that can express nuanced insights--such as misconceptions--that CD and KT models cannot capture. Extensive validation across synthetic and real-world datasets shows LBMs reveal qualitative insights beyond what CD and KT models can capture, while achieving competitive accuracy with improved sample efficiency. We demonstrate that the encoder and decoder can be fine-tuned with reinforcement learning and supervised fine-tuning respectively to improve both summary quality and predictive performance.
翻译:准确评估学生知识水平是教育的核心任务。认知诊断模型旨在估计学生在特定时间点的知识掌握程度,而知识追踪方法则通过建模动态演化的知识状态来预测未来表现。然而,现有方法要么提供表达能力受限的定量概念掌握度估计(如认知诊断模型和概率型知识追踪方法),要么以牺牲可解释性为代价优先考虑预测准确性(如深度学习知识追踪方法)。本文提出语言瓶颈模型,其核心架构包含编码器大语言模型与解码器大语言模型:编码器生成文本形式的知识状态摘要,解码器则基于该摘要预测未来表现。该方法生成的摘要具有可解释性,能够表达认知诊断与知识追踪模型无法捕捉的细微洞察(例如学习误区)。在合成数据集与真实数据集上的广泛验证表明,语言瓶颈模型不仅能揭示超越传统模型的定性洞察,还能在保持样本效率优势的同时获得具有竞争力的预测精度。我们进一步论证了通过强化学习微调编码器与监督式微调解码器,可同步提升摘要质量与预测性能。