Auditing and Mitigating Bias in Gender Classification Algorithms: A Data-Centric Approach

Gender classification systems often inherit and amplify demographic imbalances in their training data. We first audit five widely used gender classification datasets, revealing that all suffer from significant intersectional underrepresentation. To measure the downstream impact of these flaws, we train identical MobileNetV2 classifiers on the two most balanced of these datasets, UTKFace and FairFace. Our fairness evaluation shows that even these models exhibit significant bias, misclassifying female faces at a higher rate than male faces and amplifying existing racial skew. To counter these data-induced biases, we construct BalancedFace, a new public dataset created by blending images from FairFace and UTKFace, supplemented with images from other collections to fill missing demographic gaps. It is engineered to equalize subgroup shares across 189 intersections of age, race, and gender using only real, unedited images. When a standard classifier is trained on BalancedFace, it reduces the maximum True Positive Rate gap across racial subgroups by over 50% and brings the average Disparate Impact score 63% closer to the ideal of 1.0 compared to the next-best dataset, all with a minimal loss of overall accuracy. These results underline the profound value of data-centric interventions and provide an openly available resource for fair gender classification research.

翻译：性别分类系统常常继承并放大其训练数据中的人口统计学不平衡。我们首先审计了五个广泛使用的性别分类数据集，发现它们都存在显著的交叉代表性不足问题。为衡量这些缺陷的下游影响，我们在其中两个最平衡的数据集UTKFace和FairFace上训练了相同的MobileNetV2分类器。公平性评估表明，即使这些模型仍表现出显著偏见：女性面孔的误分类率高于男性面孔，并放大了现有的种族偏差。为应对这些数据引发的偏见，我们构建了BalancedFace——一个通过混合FairFace和UTKFace图像并补充其他图库图像以填补人口统计学空白而创建的新公共数据集。该数据集经过精心设计，仅使用真实未编辑的图像，在年龄、种族和性别的189个交叉维度上实现了子群体占比均衡。当标准分类器在BalancedFace上训练时，与次优数据集相比，其将种族子群体间的最大真阳性率差距降低了50%以上，并将平均差异影响分数向理想值1.0推进了63%，且整体准确率损失极小。这些结果凸显了以数据为中心干预措施的深远价值，并为公平性别分类研究提供了开放可用的资源。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

大型语言模型中隐性与显性偏见的综合研究

专知会员服务

17+阅读 · 2025年11月25日

计算机视觉中的公平性与偏差缓解：综述

专知会员服务

19+阅读 · 2024年8月7日

《研究认知偏见分类的证据》美国陆军2022最新56页论文

专知会员服务

23+阅读 · 2022年10月31日

【ACL2022】理解知识库嵌入中的性别偏见,Understanding Gender Bias in Knowledge Base Embeddings

专知会员服务

10+阅读 · 2022年3月24日