置信度陷阱：大语言模型中的性别偏见与预测确定性 (The Confidence Trap: Gender Bias and Predictive Certainty in LLMs) - 专知论文

会员服务 ·

0

置信度 · 性别偏见 · 语言模型 · 公平性 · 大语言模型 ·

The Confidence Trap: Gender Bias and Predictive Certainty in LLMs

翻译：置信度陷阱：大语言模型中的性别偏见与预测确定性

Ahmed Sabir,Markus Kängsepp,Rajesh Sharma

from arxiv, AAAI 2026 (AISI Track), Oral. Project page: https://bit.ly/4p8OKQD

The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the alignment between LLM-predicted confidence and human-annotated bias judgments. Focusing on gender bias, the research investigates probability confidence calibration in contexts involving gendered pronoun resolution. The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in LLMs. The results show that, among the six state-of-the-art models, Gemma-2 demonstrates the worst calibration according to the gender bias benchmark. The primary contribution of this work is a fairness-aware evaluation of LLMs' confidence calibration, offering guidance for ethical deployment. In addition, we introduce a new calibration metric, Gender-ECE, designed to measure gender disparities in resolution tasks.

翻译：大语言模型在敏感领域应用的日益增多，引发了人们对其置信度分数与公平性及偏见之间关系的关注。本研究探讨了LLM预测置信度与人工标注偏见判断之间的一致性。研究聚焦于性别偏见，考察了涉及性别化代词消解语境中的概率置信度校准问题。其目标在于评估基于预测置信度分数的校准指标是否能有效捕捉大语言模型中与公平性相关的差异。结果显示，在六种前沿模型中，Gemma-2在性别偏见基准测试中表现出最差的校准性能。本研究的主要贡献在于提出了针对LLM置信度校准的公平性评估框架，为伦理部署提供了指导。此外，我们引入了一种新的校准指标——性别期望校准误差，该指标专为衡量消解任务中的性别差异而设计。

0

相关内容

置信度

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

专知会员服务

14+阅读 · 2025年11月30日

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

48+阅读 · 2025年11月21日

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

专知会员服务

19+阅读 · 2024年10月13日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

52+阅读 · 2022年6月2日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

“自然语言-草图”耦合的地理场景查询方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Test-Time Scaling of Reasoning Models for Machine Translation

Arxiv

0+阅读 · 1月11日

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models

Arxiv

0+阅读 · 1月11日

Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning

Arxiv

0+阅读 · 1月10日

Analysing Differences in Persuasive Language in LLM-Generated Text: Uncovering Stereotypical Gender Patterns

Arxiv

0+阅读 · 1月9日

Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting

Arxiv

0+阅读 · 1月9日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

【牛津博士论文】无监督物体学习（Unsupervised Object Learning）

专知会员服务

14+阅读 · 2025年11月30日

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

48+阅读 · 2025年11月21日

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

【NeurIPS 2024】基于大型语言模型的三层学习用于时间序列OOD泛化

专知会员服务

19+阅读 · 2024年10月13日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

具身智能中的语义生命周期：基于基础模型的获取、表征与存储

《TERRADEFENDER：一个用于战略战场情报准备的统一平台》

【NTU博士论文】视频生成新突破：从人脸说话视频到通用视频制作

麻省理工学院启动新项目为人工智能时代培训军事领导者

相关资讯

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

52+阅读 · 2022年6月2日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

相关论文

Test-Time Scaling of Reasoning Models for Machine Translation

Arxiv

0+阅读 · 1月11日

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models

Arxiv

0+阅读 · 1月11日

Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning

Arxiv

0+阅读 · 1月10日

Analysing Differences in Persuasive Language in LLM-Generated Text: Uncovering Stereotypical Gender Patterns

Arxiv

0+阅读 · 1月9日

Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting

Arxiv

0+阅读 · 1月9日

相关基金

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

“自然语言-草图”耦合的地理场景查询方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员