Learning Context Matters: Measuring and Diagnosing Personalization Gaps in LLM-Based Instructional Design

The adoption of generative AI in education has accelerated dramatically in recent years, with Large Language Models (LLMs) increasingly integrated into learning environments in the hope of providing personalized support that enhances learner engagement and knowledge retention. However, truly personalized support requires access to meaningful Learning Context (LC) regarding who the learner is, what they are trying to understand, and how they are engaging with the material. In this paper, we present a framework for measuring and diagnosing how the LC influences instructional strategy selection in LLM-based tutoring systems. Using psychometrically grounded synthetic learning contexts and a pedagogically grounded decision space, we compare LLM instructional decisions in context-blind and context-aware conditions and quantify their alignment with the pedagogical judgments of subject matter experts. Our results show that, while providing the LC induces systematic, measurable changes in instructional decisions that move LLM policies closer to the subject matter expert policy, substantial misalignment remains. To diagnose this misalignment, we introduce a relevance-impact analysis that reveals which learner characteristics are attended to, ignored, or spuriously influential in LLM instructional decision-making. This analysis, conducted in collaboration with subject matter experts, demonstrates that LC materially shapes LLM instructional planning but does not reliably induce pedagogically appropriate personalization. Our results enable principled evaluation of context-aware LLM systems and provide a foundation for improving personalization through learner characteristic prioritization, pedagogical model tuning, and LC engineering.

翻译：近年来，生成式人工智能在教育领域的应用急剧加速，大型语言模型（LLM）越来越多地集成到学习环境中，以期提供能够增强学习者参与度和知识保持的个性化支持。然而，真正的个性化支持需要获取关于学习者身份、其试图理解的内容以及其与材料互动方式的有意义的学习情境（LC）。本文提出了一个框架，用于测量和诊断LC如何影响基于LLM的辅导系统中的教学策略选择。我们利用基于心理测量学原理的合成学习情境和基于教学法原理的决策空间，比较了LLM在无情境感知和有情境感知条件下的教学决策，并量化了这些决策与学科专家教学判断的一致性。我们的结果表明，虽然提供LC会引发教学决策系统性、可测量的变化，使LLM策略更接近学科专家策略，但仍然存在显著的不一致。为了诊断这种不一致，我们引入了相关性-影响分析，以揭示在LLM教学决策中，哪些学习者特征被关注、被忽略或产生了虚假影响。这项与学科专家合作进行的分析表明，LC确实塑造了LLM的教学规划，但并未可靠地引发教学上适当的个性化。我们的研究结果使得对情境感知LLM系统进行原则性评估成为可能，并为通过学习者特征优先级排序、教学模型调优和LC工程来改进个性化提供了基础。