Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

The Identifiable Victim Effect (IVE) $-$ the tendency to allocate greater resources to a specific, narratively described victim than to a statistically characterized group facing equivalent hardship $-$ is one of the most robust findings in moral psychology and behavioural economics. As large language models (LLMs) assume consequential roles in humanitarian triage, automated grant evaluation, and content moderation, a critical question arises: do these systems inherit the affective irrationalities present in human moral reasoning? We present the first systematic, large-scale empirical investigation of the IVE in LLMs, comprising N=51,955 validated API trials across 16 frontier models spanning nine organizational lineages (Google, Anthropic, OpenAI, Meta, DeepSeek, xAI, Alibaba, IBM, and Moonshot). Using a suite of ten experiments $-$ porting and extending canonical paradigms from Small et al. (2007) and Kogut and Ritov (2005) $-$ we find that the IVE is prevalent but strongly modulated by alignment training. Instruction-tuned models exhibit extreme IVE (Cohen's d up to 1.56), while reasoning-specialized models invert the effect (down to d=-0.85). The pooled effect (d=0.223, p=2e-6) is approximately twice the single-victim human meta-analytic baseline (d$\approx$0.10) reported by Lee and Feeley (2016) $-$ and likely exceeds the overall human pooled effect by a larger margin, given that the group-victim human effect is near zero. Standard Chain-of-Thought (CoT) prompting $-$ contrary to its role as a deliberative corrective $-$ nearly triples the IVE effect size (from d=0.15 to d=0.41), while only utilitarian CoT reliably eliminates it. We further document psychophysical numbing, perfect quantity neglect, and marginal in-group/out-group cultural bias, with implications for AI deployment in humanitarian and ethical decision-making contexts.

翻译：可识别受害者效应（IVE）——即个体倾向于将更多资源分配给一个具体的、有叙事描述的受害者，而非面临同等困境的统计意义上的群体——是道德心理学和行为经济学中最稳健的发现之一。随着大语言模型（LLM）在人道主义分诊、自动资助评估和内容审核等领域承担关键角色，一个核心问题随之产生：这些系统是否会继承人类道德推理中存在的非理性情感倾向？我们首次对LLM中的IVE进行了系统性、大规模实证研究，涵盖来自9个组织谱系（谷歌、Anthropic、OpenAI、Meta、DeepSeek、xAI、阿里巴巴、IBM和Moonshot）的16个前沿模型，共进行了51,955次经过验证的API试验。通过一套包含10个实验的系列研究——迁移并扩展了Small等人（2007）以及Kogut和Ritov（2005）的经典范式——我们发现IVE普遍存在，但受到对齐训练的强烈调节。指令微调模型表现出极端的IVE（Cohen's d高达1.56），而推理专用模型则逆转了这一效应（d降至-0.85）。合并效应量（d=0.223，p=2e-6）约为Lee和Feeley（2016）报告的单个受害者人类元分析基线（d≈0.10）的两倍——且很可能以更大倍数超过人类合并效应量，因为群体受害者的人类效应几乎为零。标准思维链（CoT）提示——与其作为审慎修正器的角色相反——使IVE效应量几乎增加了两倍（从d=0.15到d=0.41），而只有功利主义CoT能可靠地消除该效应。我们进一步记录了心理物理麻木、完全数量忽视以及轻微的内群体/外群体文化偏差，这些发现对AI在人道主义和伦理决策背景下的部署具有启示意义。