Gender classification systems often inherit and amplify demographic imbalances in their training data. We first audit five widely used gender classification datasets, revealing that all suffer from significant intersectional underrepresentation. To measure the downstream impact of these flaws, we train identical MobileNetV2 classifiers on the two most balanced of these datasets, UTKFace and FairFace. Our fairness evaluation shows that even these models exhibit significant bias, misclassifying female faces at a higher rate than male faces and amplifying existing racial skew. To counter these data-induced biases, we construct BalancedFace, a new public dataset created by blending images from FairFace and UTKFace, supplemented with images from other collections to fill missing demographic gaps. It is engineered to equalize subgroup shares across 189 intersections of age, race, and gender using only real, unedited images. When a standard classifier is trained on BalancedFace, it reduces the maximum True Positive Rate gap across racial subgroups by over 50% and brings the average Disparate Impact score 63% closer to the ideal of 1.0 compared to the next-best dataset, all with a minimal loss of overall accuracy. These results underline the profound value of data-centric interventions and provide an openly available resource for fair gender classification research.
翻译:性别分类系统常常继承并放大其训练数据中的人口统计学不平衡。我们首先审计了五个广泛使用的性别分类数据集,发现它们都存在显著的交叉代表性不足问题。为衡量这些缺陷的下游影响,我们在其中两个最平衡的数据集UTKFace和FairFace上训练了相同的MobileNetV2分类器。公平性评估表明,即使这些模型仍表现出显著偏见:女性面孔的误分类率高于男性面孔,并放大了现有的种族偏差。为应对这些数据引发的偏见,我们构建了BalancedFace——一个通过混合FairFace和UTKFace图像并补充其他图库图像以填补人口统计学空白而创建的新公共数据集。该数据集经过精心设计,仅使用真实未编辑的图像,在年龄、种族和性别的189个交叉维度上实现了子群体占比均衡。当标准分类器在BalancedFace上训练时,与次优数据集相比,其将种族子群体间的最大真阳性率差距降低了50%以上,并将平均差异影响分数向理想值1.0推进了63%,且整体准确率损失极小。这些结果凸显了以数据为中心干预措施的深远价值,并为公平性别分类研究提供了开放可用的资源。