Simulating Novice Students Using Machine Unlearning and Relearning in Large Language Models

Student simulation can support learning-by-teaching pedagogy where human students (as tutors) teach AI-simulated novice students (as tutees). Recent research often relies on prompt engineering with large language models (LLMs) to simulate novice student behaviour, but it is difficult to keep the AI-simulated student at a stable novice knowledge level. A key reason is that many LLMs are trained to be broadly capable, so even when prompted to "act like a novice," the LLMs can still produce expert-level explanations during the learning-by-teaching interaction process. As a result, the AI-simulated student may drift beyond the intended knowledge level, reducing the credibility of the simulation for studying learning-by-teaching processes. Thus, we propose a knowledge-level simulation approach based on machine unlearning. We investigate this approach using a dataset of multiple-choice questions on Python programming concepts. We apply machine unlearning to transform a knowledgeable LLM into a novice-level AI student (i.e., teachable agent), then evaluate whether the teachable agent can relearn targeted knowledge components through learning-by-teaching dialogue interactions. Finally, we analyse the dialogue logs to characterise how the agent's behaviour changes over time, including its question asking, error patterns, and responsiveness to instruction. The results show that (1) unlearning produces simulated student agents with more novice-like responses than prompt-only baselines, (2) the agents recover a measurable portion of the unlearned knowledge under structured exposure, and (3) dialogue analyses reveal identifiable trajectories of conceptual change and teaching moves that predict learning recovery.

翻译：学生模拟可支持“教学相长”的教学法，即人类学生（作为导师）教授AI模拟的新手学生（作为受教者）。近期研究常依赖基于大语言模型的提示工程来模拟新手学生的行为，但难以使AI模拟学生稳定保持在基础知识水平。关键原因在于，许多大语言模型经过训练具有广泛能力，即便被提示“假装成新手”，在互动教学过程中仍可能产生专家级解释。这导致AI模拟学生可能偏离预期知识水平，降低其在研究教学相长过程中的可信度。为此，本文提出一种基于机器遗忘的知识水平模拟方法。我们采用Python编程概念的多项选择题数据集开展研究。通过机器遗忘将知识渊博的大语言模型转化为初级水平的AI学生（即可教学代理），进而评估该可教学代理能否通过教学对话互动重新学习目标知识组件。最后分析对话日志，刻画代理行为随时间的变化特征，包括提问方式、错误模式及对教学指令的响应能力。实验结果表明：（1）相较于仅使用提示的基线方法，遗忘技术能生成更符合新手特点的模拟学生代理；（2）在结构化教学引导下，代理可恢复被遗忘知识的可测量部分；（3）对话分析揭示了概念变化的可识别轨迹及预测学习恢复的教学行为。