大语言模型能否理解我们无法言说之事？通过堕胎污名在认知、人际和结构层面的多层级对齐测量 (Can LLMs Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels)

As Large Language Models (LLMs) increasingly mediate stigmatized health decisions, their capacity to understand complex psychological phenomena remains inadequately assessed. Can LLMs understand what we cannot say? We investigate whether LLMs coherently represent abortion stigma across cognitive, interpersonal, and structural levels. We systematically tested 627 demographically diverse personas across five leading LLMs using the validated Individual Level Abortion Stigma Scale (ILAS), examining representation at cognitive (self-judgment), interpersonal (worries about judgment and isolation), and structural (community condemnation and disclosure patterns) levels. Models fail tests of genuine understanding across all dimensions. They underestimate cognitive stigma while overestimating interpersonal stigma, introduce demographic biases assigning higher stigma to younger, less educated, and non-White personas, and treat secrecy as universal despite 36% of humans reporting openness. Most critically, models produce internal contradictions: they overestimate isolation yet predict isolated individuals are less secretive, revealing incoherent representations. These patterns show current alignment approaches ensure appropriate language but not coherent understanding across levels. This work provides empirical evidence that LLMs lack coherent understanding of psychological constructs operating across multiple dimensions. AI safety in high-stakes contexts demands new approaches to design (multilevel coherence), evaluation (continuous auditing), governance and regulation (mandatory audits, accountability, deployment restrictions), and AI literacy in domains where understanding what people cannot say determines whether support helps or harms.

翻译：随着大语言模型（LLMs）日益介入具有污名化的健康决策，其理解复杂心理现象的能力仍未得到充分评估。大语言模型能否理解我们无法言说之事？本研究探讨LLMs是否能在认知、人际和结构层面连贯地表征堕胎污名。我们使用经过验证的个人层面堕胎污名量表（ILAS），系统测试了五个主流LLMs中的627个具有人口统计学多样性的人物设定，考察其在认知层面（自我评判）、人际层面（对评判与孤立的担忧）以及结构层面（社区谴责与披露模式）的表征情况。模型在所有维度的真实理解测试中均告失败。它们低估了认知层面的污名，却高估了人际层面的污名；引入了人口统计学偏见，为更年轻、教育程度较低以及非白人的人物设定了更高的污名水平；并且将保密行为视为普遍现象，尽管有36%的人类受访者报告了开放性态度。最关键的是，模型产生了内部矛盾：它们高估了孤立感，却预测孤立个体更少保密，这揭示了其表征的不连贯性。这些模式表明，当前的对齐方法确保了恰当的语言使用，但未能实现跨层级的连贯理解。本工作提供了实证证据，表明LLMs缺乏对跨越多个维度的心理建构的连贯理解。在高风险情境中，AI安全需要在设计（多层级连贯性）、评估（持续审计）、治理与监管（强制审计、问责制、部署限制）以及AI素养方面采取新的方法，尤其是在理解人们无法言说之事决定了支持是有益还是有害的领域。