Auditing Stance Asymmetry in Generative Explanations

Bias evaluation for language models has made substantial progress on bounded comparisons, such as overt derogation, stereotype association, or label-sensitive differences under controlled substitutions. Open-ended explanations raise a different problem: they guide interpretation by assigning responsibility, legitimacy, context, and grievance. A model can avoid hostile language while making one side structurally understandable and another personally at fault, overreacting, or less worth taking seriously. We call this stance-bearing asymmetry in generative explanations. We propose Symmetry Decomposition Evaluation (SDE), which tests paired situations with concrete group labels, structural-role rewrites, and explicit support or counter-evidence. In a controlled 32-family prototype suite, this decomposition shows that surface differences are not all alike: some weaken under structural or evidence control, while others remain as stable differences in how the model assigns blame, context, or legitimacy. Targeted case review and judge comparison suggest a broader difficulty for evaluating open-ended framing asymmetries: judge readings shift across operationalizations, and scalar scores can flatten distinctions that readers use to interpret explanatory stance. SDE therefore reframes generative bias evaluation as an audit of explanatory stance -- what stance each side receives, how it changes under decomposition, and where automatic scoring becomes unstable.

翻译：语言模型的偏见评估在有限比较方面取得了显著进展，例如明显贬低、刻板印象关联或受控替换下的标签敏感差异。开放性解释提出了不同的问题：它们通过分配责任、合法性、背景和委屈来引导解读。模型可以避免敌意语言，同时使一方在结构上可理解，而另一方则被归咎于个人过错、反应过度或较不值得认真对待。我们将此称为生成性解释中的立场不对称。我们提出了对称性分解评估（SDE），该方法通过具体群体标签、结构性角色改写以及明确支持或反证来测试配对情境。在一个受控的32族原型套件中，这种分解表明表面差异并非全部相同：有些在结构性或证据控制下减弱，而另一些则作为模型如何分配责备、背景或合法性的稳定差异保留下来。针对性案例审查和评判者比较表明，评估开放性框架不对称性存在更广泛的困难：评判者解读在不同操作化方式下发生变化，标量评分可能抹平读者用于解读解释性立场的区别。因此，SDE将生成性偏见评估重新定义为对解释性立场的审计——每一方接受何种立场、其在分解下如何变化，以及自动评分在何处变得不稳定。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大型语言模型中隐性与显性偏见的综合研究

专知会员服务

17+阅读 · 2025年11月25日

【CVPR2025】《文本到视频生成技术能否促进视频-语言对齐？》

专知会员服务

10+阅读 · 2025年3月25日

【博士论文】《计算机视觉中潜在表示的不确定性》，66页pdf

专知会员服务

22+阅读 · 2024年8月28日

【CVPR2024】OpenBias: 文本到图像生成模型中的开放集偏见检测

专知会员服务

15+阅读 · 2024年4月14日