Misconception Acquisition Dynamics in Large Language Models

Effective educational AI depends on modeling student misconceptions. Such models enable realistic learner simulation and diagnostic, adaptive tutoring. However, instruction-tuning large language models on student responses containing misconception errors can degrade reasoning abilities, creating a tension between faithful misconception modeling and preserving correct reasoning in other contexts. To support both learner simulation and tutoring, we study two misconception-aware models: the Novice Student Misconception Model, trained to acquire a single misconception for simulating an individual student, and the Expert Tutor Misconception Model, trained on multiple misconceptions to capture the error patterns a tutor encounters across students. To study the misconception acquisition dynamics of both models, we develop MalAlgoLib, a library that generates algebra problems with correct solution traces and misconception-specific erroneous traces. Our experiments across three LLMs reveal that the student and the tutor model exhibit fundamentally different misconception acquisition dynamics. For the student model, a single misconception is not learned as a context-specific behavior. Models overapply it across problems, degrading correct-solving accuracy unless training includes correct examples to enforce boundaries. In contrast, the tutor model can learn multiple misconceptions jointly without sacrificing correct-solving accuracy. Critically, intermediate reasoning steps are the bottleneck. With final-answer supervision alone, models cannot learn where error enters the solution, so neither the student model nor the tutor model acquires misconceptions regardless of data size. Together, these results, enabled by MalAlgoLib, provide an interpretable account of misconception acquisition under instruction tuning and guidance for training misconception-aware LLMs while preserving correct reasoning.

翻译：有效的教育AI依赖于对学生误解的建模。此类模型能够实现真实的学习者模拟以及诊断性和适应性辅导。然而，在包含误解错误的学生回答上对大型语言模型进行指令微调可能会削弱其推理能力，从而在忠实建模误解与在其他情境中保持正确推理之间形成矛盾。为了同时支持学习者模拟和辅导，我们研究了两种具备误解感知能力的模型：新手学生误解模型，该模型被训练习得单一误解以模拟单个学生；以及专家导师误解模型，该模型在多种误解数据上训练，以捕捉导师在跨学生辅导中遇到的错误模式。为了研究这两种模型的误解习得动态，我们开发了MalAlgoLib，这是一个生成代数问题的库，包含正确的解题轨迹和特定误解引起的错误轨迹。我们在三种LLM上的实验表明，学生模型和导师模型展现出根本不同的误解习得动态。对于学生模型，单一误解并非作为上下文特定行为被学习；模型会将其过度应用于各种问题，除非训练中包含正确示例以界定边界，否则会降低正确解题准确率。相反，导师模型可以在不牺牲正确解题准确率的情况下联合学习多种误解。关键在于，中间推理步骤是瓶颈。如果仅依赖最终答案的监督信号，模型无法学习错误进入解题过程的环节，因此无论数据规模如何，学生模型和导师模型都无法习得误解。综合来看，这些由MalAlgoLib实现的结果，为指令微调下误解习得问题提供了可解释的说明，并为在保持正确推理的同时训练具备误解感知能力的LLM提供了指导。