Large language models (LLMs) have advanced the development of personalized learning in education. However, their inherent generation mechanisms often produce homogeneous responses to identical prompts. This one-size-fits-all mechanism overlooks the substantial heterogeneity in students cognitive and psychological, thereby posing potential safety risks to vulnerable groups. Existing safety evaluations primarily rely on context-independent metrics such as factual accuracy, bias, or toxicity, which fail to capture the divergent harms that the same response might cause across different student attributes. To address this gap, we propose the concept of Student-Tailored Personalized Safety and construct CASTLE based on educational theories. This benchmark covers 15 educational safety risks and 14 student attributes, comprising 92,908 bilingual scenarios. We further design three evaluation metrics: Risk Sensitivity, measuring the model ability to detect risks; Emotional Empathy, evaluating the model capacity to recognize student states; and Student Alignment, assessing the match between model responses and student attributes. Experiments on 18 SOTA LLMs demonstrate that CASTLE poses a significant challenge: all models scored below an average safety rating of 2.3 out of 5, indicating substantial deficiencies in personalized safety assurance.
翻译:大语言模型(LLMs)推动了教育领域个性化学习的发展。然而,其固有的生成机制往往对相同提示产生同质化响应。这种“一刀切”的机制忽视了学生在认知与心理层面的显著异质性,从而对弱势群体构成潜在安全风险。现有安全评估主要依赖事实准确性、偏见或毒性等与上下文无关的指标,无法捕捉同一响应在不同学生属性下可能造成的差异性危害。为填补这一空白,我们提出“面向学生的个性化安全性”概念,并基于教育理论构建了CASTLE基准。该基准涵盖15类教育安全风险与14种学生属性,包含92,908个双语场景。我们进一步设计了三项评估指标:风险敏感性(衡量模型识别风险的能力)、情感共情(评估模型感知学生状态的能力)以及学生对齐度(评估模型响应与学生属性的匹配程度)。在18个前沿大语言模型上的实验表明,CASTLE构成了显著挑战:所有模型在5分制下的平均安全评分均低于2.3分,表明其在个性化安全保障方面存在严重不足。