Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models. The code is available at https://github.com/irisdominguez/dataset_bias_metrics.
翻译:源数据集中的统计偏差已被证明是机器学习模型预测中产生不公与歧视的原因之一。其中,最显著的人口统计偏差类型是数据集中不同人口统计群体在表征上的统计不平衡。本文通过回顾现有度量指标(包括可借鉴其他学科的指标),系统研究了这些偏差的量化方法。我们构建了一套针对这些指标的分类体系,为选择合适指标提供了实践指南。为阐明该框架的实用性并进一步理解指标的实际特征,我们以20个人脸表情识别(FER)数据集为案例开展偏差分析。实验结果表明,许多指标具有冗余性,仅需少量指标即可充分衡量人口统计偏差程度。本研究为人工智能及相关领域研究者缓解数据集偏差、提升模型公平性与准确性提供了重要参考。相关代码已开源至 https://github.com/irisdominguez/dataset_bias_metrics。