Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis - 专知论文

会员服务 ·

0

有偏 · TD · Facebook AI Research · 标注 · Analysis ·

2023 年 4 月 25 日

Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis

翻译：公平性与偏差在真值发现算法中的实验分析

Simone Lazier,Saravanan Thirumuruganathan,Hadis Anahideh

from arxiv, Accepted in Algorithmic Fairness in Artificial intelligence, Machine learning and Decision Making workshop at SDM 2023

Machine learning (ML) based approaches are increasingly being used in a number of applications with societal impact. Training ML models often require vast amounts of labeled data, and crowdsourcing is a dominant paradigm for obtaining labels from multiple workers. Crowd workers may sometimes provide unreliable labels, and to address this, truth discovery (TD) algorithms such as majority voting are applied to determine the consensus labels from conflicting worker responses. However, it is important to note that these consensus labels may still be biased based on sensitive attributes such as gender, race, or political affiliation. Even when sensitive attributes are not involved, the labels can be biased due to different perspectives of subjective aspects such as toxicity. In this paper, we conduct a systematic study of the bias and fairness of TD algorithms. Our findings using two existing crowd-labeled datasets, reveal that a non-trivial proportion of workers provide biased results, and using simple approaches for TD is sub-optimal. Our study also demonstrates that popular TD algorithms are not a panacea. Additionally, we quantify the impact of these unfair workers on downstream ML tasks and show that conventional methods for achieving fairness and correcting label biases are ineffective in this setting. We end the paper with a plea for the design of novel bias-aware truth discovery algorithms that can ameliorate these issues.

翻译：基于机器学习的方法正越来越多地应用于具有社会影响的诸多场景中。训练机器学习模型通常需要大量标注数据，而众包是从多个标注者处获取标签的主要范式。众包工人有时会提供不可靠的标签，为此，人们采用多数投票等真值发现算法从冲突的工人响应中确定共识标签。然而，必须指出，这些共识标签仍可能基于性别、种族或政治倾向等敏感属性而产生偏差。即便不涉及敏感属性，标签也可能因毒性这类主观维度上的不同视角而产生偏差。本文对真值发现算法的偏差与公平性进行了系统性研究。通过使用两个现成的众包标注数据集，我们发现相当比例的工人提供了有偏差的结果，而且采用简单方法进行真值发现并非最优选择。研究还表明，流行的真值发现算法并非万能灵药。此外，我们量化了这些不公正工人对下游机器学习任务的影响，并证明在本文设定下，实现公平性和纠正标签偏差的传统方法效果不佳。文末，我们呼吁设计能够改善这些问题的全新偏差感知真值发现算法。

0

相关内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

SMAD2调控ERK通路干预M2巨噬细胞活化在糖尿病肾病小鼠肾脏纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

EGFR信号通路调控肿瘤相关巨噬细胞极化的机制及其在细胞恶性转化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

IRF-1调控肺泡巨噬细胞焦亡在急性肺损伤中的作用及信号机制

国家自然科学基金

0+阅读 · 2014年12月31日

磁电多铁性材料Z型铁氧体的磁介电效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Src/Stat3信号通路在肾细胞癌中的作用机制及其靶向治疗研究

国家自然科学基金

0+阅读 · 2013年12月31日

垂直各向异性GdFeCo金属薄膜的磁畴演化与大磁光效应

国家自然科学基金

0+阅读 · 2012年12月31日

新型Fe3Se4基各向异性纳米结构的硬磁性能调控和矫顽力机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems

Arxiv

0+阅读 · 2023年6月9日

Towards Fairness in Personalized Ads Using Impression Variance Aware Reinforcement Learning

Arxiv

0+阅读 · 2023年6月8日

Task-specific experimental design for treatment effect estimation

Arxiv

0+阅读 · 2023年6月8日

Are fairness metric scores enough to assess discrimination biases in machine learning?

Arxiv

0+阅读 · 2023年6月8日

Causal Fairness for Outcome Control

Arxiv

0+阅读 · 2023年6月8日

A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions

Arxiv

0+阅读 · 2023年6月7日

Interventional and Counterfactual Inference with Diffusion Models

Arxiv

0+阅读 · 2023年6月6日

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

Arxiv

0+阅读 · 2023年6月6日

Estimating Treatment Effects Using Observational Data and Experimental Data with Non-overlapping Support

Arxiv

0+阅读 · 2023年6月6日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Arxiv

13+阅读 · 2022年2月21日

VIP会员

文章信息

相关主题

Facebook AI Research

最新内容

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

3+阅读 · 今天14:49

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

1+阅读 · 今天14:25

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

2+阅读 · 今天13:57

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

2+阅读 · 今天13:27

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

11+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

10+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

8+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

6+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

8+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

9+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

8+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

10+阅读 · 7月20日

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

对抗环境下超视距目标打击的情报支援

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems

Arxiv

0+阅读 · 2023年6月9日

Towards Fairness in Personalized Ads Using Impression Variance Aware Reinforcement Learning

Arxiv

0+阅读 · 2023年6月8日

Task-specific experimental design for treatment effect estimation

Arxiv

0+阅读 · 2023年6月8日

Are fairness metric scores enough to assess discrimination biases in machine learning?

Arxiv

0+阅读 · 2023年6月8日

Causal Fairness for Outcome Control

Arxiv

0+阅读 · 2023年6月8日

A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions

Arxiv

0+阅读 · 2023年6月7日

Interventional and Counterfactual Inference with Diffusion Models

Arxiv

0+阅读 · 2023年6月6日

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

Arxiv

0+阅读 · 2023年6月6日

Estimating Treatment Effects Using Observational Data and Experimental Data with Non-overlapping Support

Arxiv

0+阅读 · 2023年6月6日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Arxiv

13+阅读 · 2022年2月21日

相关基金

SMAD2调控ERK通路干预M2巨噬细胞活化在糖尿病肾病小鼠肾脏纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

EGFR信号通路调控肿瘤相关巨噬细胞极化的机制及其在细胞恶性转化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

IRF-1调控肺泡巨噬细胞焦亡在急性肺损伤中的作用及信号机制

国家自然科学基金

0+阅读 · 2014年12月31日

磁电多铁性材料Z型铁氧体的磁介电效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Src/Stat3信号通路在肾细胞癌中的作用机制及其靶向治疗研究

国家自然科学基金

0+阅读 · 2013年12月31日

垂直各向异性GdFeCo金属薄膜的磁畴演化与大磁光效应

国家自然科学基金

0+阅读 · 2012年12月31日

新型Fe3Se4基各向异性纳米结构的硬磁性能调控和矫顽力机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员