基于样本量分析的分类偏差度量方法 (Measures of classification bias derived from sample size analysis) - 专知论文

会员服务 ·

0

度量 · 样本 · 偏差 · 错误率 · 算法 ·

Measures of classification bias derived from sample size analysis

翻译：基于样本量分析的分类偏差度量方法

Ioannis Ivrissimtzis,Shauna Concannon,Matthew Houliston,Graham Roberts

from arxiv, 9 pages, 3 figures

We propose the use of a simple intuitive principle for measuring algorithmic classification bias: the significance of the differences in a classifier's error rates across the various demographics is inversely commensurate with the sample size required to statistically detect them. That is, if large sample sizes are required to statistically establish biased behavior, the algorithm is less biased, and vice versa. In a simple setting, we assume two distinct demographics, and non-parametric estimates of the error rates on them, e1 and e2, respectively. We use a well-known approximate formula for the sample size of the chi-squared test, and verify some basic desirable properties of the proposed measure. Next, we compare the proposed measure with two other commonly used statistics, the difference e2-e1 and the ratio e2/e1 of the error rates. We establish that the proposed measure is essentially different in that it can rank algorithms for bias differently, and we discuss some of its advantages over the other two measures. Finally, we briefly discuss how some of the desirable properties of the proposed measure emanate from fundamental characteristics of the method, rather than the approximate sample size formula we used, and thus, are expected to hold in more complex settings with more than two demographics.

翻译：我们提出一种简单直观的原则来度量算法分类偏差：分类器在不同人口统计群体间错误率差异的显著性，与统计检测这些差异所需样本量成反比。也就是说，若需要较大样本量才能统计证实存在偏差行为，则算法偏差程度较低，反之亦然。在简单设定中，我们假设存在两个不同人口统计群体，并分别获得其错误率的非参数估计值e1和e2。我们采用卡方检验样本量的经典近似公式，验证了所提出度量方法具备若干基本理想特性。随后，我们将所提度量与另外两种常用统计量——错误率差值e2-e1和错误率比值e2/e1——进行比较。研究表明所提度量具有本质区别，其可能对算法的偏差程度给出不同排序，并讨论了该度量相较于另两种度量的优势。最后，我们简要探讨了所提度量的某些理想特性源于该方法的基本特征，而非我们所采用的近似样本量公式，因此预期在更复杂（涉及两个以上人口统计群体）的场景中同样成立。

0

相关内容

【博士论文】针对基于文本的基础模型的分类偏差分析与缓解

【博士论文】针对基于文本的基础模型的分类偏差分析与缓解

专知会员服务

15+阅读 · 2025年3月10日

基于因果推断的推荐系统去偏研究

基于因果推断的推荐系统去偏研究

专知会员服务

21+阅读 · 2024年11月10日

【CMU博士论文】分布偏移下的不确定性量化，226页pdf

【CMU博士论文】分布偏移下的不确定性量化，226页pdf

专知会员服务

31+阅读 · 2023年9月30日

小样本图像分类研究综述

小样本图像分类研究综述

专知会员服务

58+阅读 · 2023年1月27日

零样本图像分类综述

专知会员服务

52+阅读 · 2021年5月15日

【Mila】通用表示Transformer少样本图像分类

【Mila】通用表示Transformer少样本图像分类

专知会员服务

33+阅读 · 2020年9月7日

【NeurIPS2020提交论文】通用表示Transformer层的小样本图像分类

【NeurIPS2020提交论文】通用表示Transformer层的小样本图像分类

专知会员服务

59+阅读 · 2020年6月29日

基于小样本学习的图像分类技术综述

基于小样本学习的图像分类技术综述

专知会员服务

152+阅读 · 2020年5月6日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

小样本也能增量学习？CVPR 2020 Oral最新干货：小样本类增量学习

小样本也能增量学习？CVPR 2020 Oral最新干货：小样本类增量学习

CVer

54+阅读 · 2020年5月1日

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

中国人工智能学会

20+阅读 · 2019年7月17日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

机器之心

18+阅读 · 2019年4月1日

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

专知

127+阅读 · 2019年3月31日

博客 | 度量学习总结(二) | 如何使用度量学习处理高维数据？

博客 | 度量学习总结(二) | 如何使用度量学习处理高维数据？

AI研习社

20+阅读 · 2019年3月26日

异常检测的阈值，你怎么选？给你整理好了...

异常检测的阈值，你怎么选？给你整理好了...

机器学习算法与Python学习

10+阅读 · 2018年9月19日

统计学常用数据类型

统计学常用数据类型

论智

19+阅读 · 2018年7月6日

干货：基于用户画像的聚类分析

干货：基于用户画像的聚类分析

数据分析

22+阅读 · 2018年5月17日

各种相似性度量及Python实现

各种相似性度量及Python实现

机器学习算法与Python学习

11+阅读 · 2017年7月6日

基于分类能力结构度量与类相关性关系保留的特征选取方法研究

国家自然科学基金

1+阅读 · 2017年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

半参数回归模型中随机误差分布的检验问题

国家自然科学基金

2+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

样本特性对海洋遥感产品真实性检验的定量化影响研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于异构信息网络的分类算法推荐方法研究

国家自然科学基金

7+阅读 · 2015年12月31日

基于部分核实数据的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

测量误差数据下约束线性模型的有偏估计及变量选择研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于相关性的大数据分类理论与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于字典学习的小样本高光谱遥感图像稀疏表示分类精度研究与应用

国家自然科学基金

3+阅读 · 2014年12月31日

Interpretable Logical Anomaly Classification via Constraint Decomposition and Instruction Fine-Tuning

Arxiv

0+阅读 · 2月3日

Blinded sample size re-estimation accounting for uncertainty in mid-trial estimation

Arxiv

0+阅读 · 2月3日

Causal Characterization of Measurement and Mechanistic Anomalies

Arxiv

0+阅读 · 1月30日

Analyzing decision tree bias towards the minority class

Arxiv

0+阅读 · 1月28日

Identification capacity and rate-query tradeoffs in classification systems

Arxiv

0+阅读 · 1月20日

On the Generalization Error of Differentially Private Algorithms Via Typicality

Arxiv

0+阅读 · 1月17日

Classification Imbalance as Transfer Learning

Arxiv

0+阅读 · 1月15日

On the Generalization Error of Differentially Private Algorithms Via Typicality

Arxiv

0+阅读 · 1月13日

A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

Arxiv

0+阅读 · 1月7日

Rectifying Adversarial Examples Using Their Vulnerabilities

Arxiv

0+阅读 · 1月1日

VIP会员

文章信息

相关主题

相关VIP内容

【博士论文】针对基于文本的基础模型的分类偏差分析与缓解

【博士论文】针对基于文本的基础模型的分类偏差分析与缓解

专知会员服务

15+阅读 · 2025年3月10日

基于因果推断的推荐系统去偏研究

基于因果推断的推荐系统去偏研究

专知会员服务

21+阅读 · 2024年11月10日

【CMU博士论文】分布偏移下的不确定性量化，226页pdf

【CMU博士论文】分布偏移下的不确定性量化，226页pdf

专知会员服务

31+阅读 · 2023年9月30日

小样本图像分类研究综述

小样本图像分类研究综述

专知会员服务

58+阅读 · 2023年1月27日

零样本图像分类综述

专知会员服务

52+阅读 · 2021年5月15日

【Mila】通用表示Transformer少样本图像分类

【Mila】通用表示Transformer少样本图像分类

专知会员服务

33+阅读 · 2020年9月7日

【NeurIPS2020提交论文】通用表示Transformer层的小样本图像分类

【NeurIPS2020提交论文】通用表示Transformer层的小样本图像分类

专知会员服务

59+阅读 · 2020年6月29日

基于小样本学习的图像分类技术综述

基于小样本学习的图像分类技术综述

专知会员服务

152+阅读 · 2020年5月6日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

相关资讯

小样本也能增量学习？CVPR 2020 Oral最新干货：小样本类增量学习

小样本也能增量学习？CVPR 2020 Oral最新干货：小样本类增量学习

CVer

54+阅读 · 2020年5月1日

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

【综述】3D数据分类深度学习方法综述，25页论文带你全面了解最新进展

中国人工智能学会

20+阅读 · 2019年7月17日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

机器之心

18+阅读 · 2019年4月1日

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

大数据时代小样本如何学习？看这篇最新《小样本学习方法综述》论文

专知

127+阅读 · 2019年3月31日

博客 | 度量学习总结(二) | 如何使用度量学习处理高维数据？

博客 | 度量学习总结(二) | 如何使用度量学习处理高维数据？

AI研习社

20+阅读 · 2019年3月26日

异常检测的阈值，你怎么选？给你整理好了...

异常检测的阈值，你怎么选？给你整理好了...

机器学习算法与Python学习

10+阅读 · 2018年9月19日

统计学常用数据类型

统计学常用数据类型

论智

19+阅读 · 2018年7月6日

干货：基于用户画像的聚类分析

干货：基于用户画像的聚类分析

数据分析

22+阅读 · 2018年5月17日

各种相似性度量及Python实现

各种相似性度量及Python实现

机器学习算法与Python学习

11+阅读 · 2017年7月6日

相关论文

Interpretable Logical Anomaly Classification via Constraint Decomposition and Instruction Fine-Tuning

Arxiv

0+阅读 · 2月3日

Blinded sample size re-estimation accounting for uncertainty in mid-trial estimation

Arxiv

0+阅读 · 2月3日

Causal Characterization of Measurement and Mechanistic Anomalies

Arxiv

0+阅读 · 1月30日

Analyzing decision tree bias towards the minority class

Arxiv

0+阅读 · 1月28日

Identification capacity and rate-query tradeoffs in classification systems

Arxiv

0+阅读 · 1月20日

On the Generalization Error of Differentially Private Algorithms Via Typicality

Arxiv

0+阅读 · 1月17日

Classification Imbalance as Transfer Learning

Arxiv

0+阅读 · 1月15日

On the Generalization Error of Differentially Private Algorithms Via Typicality

Arxiv

0+阅读 · 1月13日

A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

Arxiv

0+阅读 · 1月7日

Rectifying Adversarial Examples Using Their Vulnerabilities

Arxiv

0+阅读 · 1月1日

相关基金

基于分类能力结构度量与类相关性关系保留的特征选取方法研究

国家自然科学基金

1+阅读 · 2017年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

半参数回归模型中随机误差分布的检验问题

国家自然科学基金

2+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

样本特性对海洋遥感产品真实性检验的定量化影响研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于异构信息网络的分类算法推荐方法研究

国家自然科学基金

7+阅读 · 2015年12月31日

基于部分核实数据的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

测量误差数据下约束线性模型的有偏估计及变量选择研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于相关性的大数据分类理论与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于字典学习的小样本高光谱遥感图像稀疏表示分类精度研究与应用

国家自然科学基金

3+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员