People often encounter role conflicts -- social dilemmas where the expectations of multiple roles clash and cannot be simultaneously fulfilled. As large language models (LLMs) increasingly navigate these social dynamics, a critical research question emerges. When faced with such dilemmas, do LLMs prioritize dynamic contextual cues or the learned preferences? To address this, we introduce RoleConflictBench, a novel benchmark designed to measure the contextual sensitivity of LLMs in role conflict scenarios. To enable objective evaluation within this subjective domain, we employ situational urgency as a constraint for decision-making. We construct the dataset through a three-stage pipeline that generates over 13,000 realistic scenarios across 65 roles in five social domains by systematically varying the urgency of competing situations. This controlled setup enables us to quantitatively measure contextual sensitivity, determining whether model decisions align with the situational contexts or are overridden by the learned role preferences. Our analysis of 10 LLMs reveals that models substantially deviate from this objective baseline. Instead of responding to dynamic contextual cues, their decisions are predominantly governed by the preferences toward specific social roles.
翻译:人们常常面临角色冲突——即多种社会角色的期望相互冲突且无法同时满足的社会困境。随着大语言模型(LLMs)日益介入这些社会动态,一个关键的研究问题随之浮现:面对此类困境时,LLMs是优先考虑动态情境线索还是习得性偏好?为解决这一问题,我们提出了RoleConflictBench,一个旨在衡量LLMs在角色冲突场景中情境敏感性的创新基准。为了在该主观领域实现客观评估,我们采用情境紧急性作为决策约束条件。通过系统性改变竞争性情境的紧急性,我们构建了一个三阶段流水线数据集,涵盖五个社会领域中的65种角色,生成了超过13,000个真实场景。这种受控设置使我们能够定量测量情境敏感性,判断模型决策是遵循情境线索还是被习得性角色偏好所覆盖。对10个LLMs的分析显示,模型决策显著偏离了这一客观基线。它们的决策并非响应动态情境线索,而是主要受控于对特定社会角色的偏好。