Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models. The code is available at https://github.com/irisdominguez/dataset_bias_metrics.
翻译:源数据集中的人口统计偏差已被证明是导致机器学习模型预测不公与歧视的原因之一。最显著的人口统计偏差类型之一是数据集中人口统计群体表征的统计失衡。本文通过梳理现有度量指标(包括可从其他学科借鉴的指标)来研究这些偏差的测量方法。我们构建了这些度量指标的分类体系,为选择合适指标提供了实用指南。为阐明本框架的实用性并进一步理解各指标的实际特性,我们对20个用于面部情绪识别(FER)的数据集进行了案例研究,分析了其中存在的偏差。实验结果表明,许多指标存在冗余性,仅需一个精简的指标子集即可有效衡量人口统计偏差的程度。本研究为人工智能及相关领域的研究者提供了重要参考,有助于缓解数据集偏差并提升AI模型的公平性与准确性。代码已发布于 https://github.com/irisdominguez/dataset_bias_metrics。