Understanding how students with different proficiency levels respond to educational materials is a critical issue within the field of AI for Education. However, acquiring sufficient real student response data for a robust evaluation is often hindered by cost, ethics, and security constraints. Consequently, LLM-based student proficiency simulation, especially prompt-based methods, has emerged as a practical alternative under data-scarce conditions. Despite their promise, current methods still exhibit limited controllability with coarse-grained proficiency representations, high sensitivity to prompt design, and the lack of calibration with academic performance. Therefore, we propose Parameterized Student Proficiency Simulation (PS$^2$), an unsupervised and parameterized model-level framework that simulates students with different proficiencies by interpolating between a strong upper-bound LLM and a weaker, cognitive error-informed lower-bound student LLM via a hybrid ratio. Specifically, the lower-bound model is constructed by fine-tuning the weaker LM to exhibit cognitive errors when responding to educational materials. To ensure alignment with target proficiency levels, PS$^2$ further calibrates the interpolation ratio with academic performance. Experiments on two public datasets demonstrate that PS$^2$ achieves finer-grained and consistent proficiency simulation compared to existing baselines, leading to superior performance in student behavior similarity and item difficulty prediction.
翻译:理解不同能力水平的学生如何对教育材料作出响应,是教育人工智能领域的一个关键问题。然而,获取足够的真实学生响应数据以进行稳健评估,常常受到成本、伦理和安全限制的阻碍。因此,基于大语言模型(LLM)的学生能力模拟,特别是基于提示的方法,已成为数据稀缺条件下一种实用的替代方案。尽管前景广阔,但现有方法仍存在可控性有限(表现为粗粒度的能力表征)、对提示设计高度敏感,以及缺乏与学业成绩的校准等问题。为此,我们提出了参数化学生能力模拟(PS$^2$),这是一个无监督、参数化的模型级框架,它通过一个混合比例,在一个强大的上界LLM与一个较弱的、基于认知错误的下界学生LLM之间进行插值,从而模拟不同能力水平的学生。具体而言,下界模型是通过对较弱的语言模型进行微调构建的,使其在响应教育材料时表现出认知错误。为确保与目标能力水平对齐,PS$^2$进一步利用学业成绩对插值比例进行校准。在两个公开数据集上的实验表明,与现有基线方法相比,PS$^2$实现了更细粒度且一致的能力模拟,从而在学生行为相似性和项目难度预测方面取得了更优的性能。