This study investigates the possibility of mitigating the demographic biases that affect face recognition technologies through the use of synthetic data. Demographic biases have the potential to impact individuals from specific demographic groups, and can be identified by observing disparate performance of face recognition systems across demographic groups. They primarily arise from the unequal representations of demographic groups in the training data. In recent times, synthetic data have emerged as a solution to some problems that affect face recognition systems. In particular, during the generation process it is possible to specify the desired demographic and facial attributes of images, in order to control the demographic distribution of the synthesized dataset, and fairly represent the different demographic groups. We propose to fine-tune with synthetic data existing face recognition systems that present some demographic biases. We use synthetic datasets generated with GANDiffFace, a novel framework able to synthesize datasets for face recognition with controllable demographic distribution and realistic intra-class variations. We consider multiple datasets representing different demographic groups for training and evaluation. Also, we fine-tune different face recognition systems, and evaluate their demographic fairness with different metrics. Our results support the proposed approach and the use of synthetic data to mitigate demographic biases in face recognition.
翻译:本研究探讨通过使用合成数据缓解人脸识别技术中人口统计偏差的可能性。人口统计偏差可能影响特定人口群体中的个体,可通过观察人脸识别系统在不同人口群体间表现的差异来识别。这类偏差主要源于训练数据中人口群体表征的不均衡。近年来,合成数据已成为解决人脸识别系统某些问题的方案。特别是在生成过程中,可以指定所需的图像人口统计学特征和人脸属性,以控制合成数据集的群体分布,并公平地代表不同人口群体。我们提出对存在人口统计偏差的现有人脸识别系统进行合成数据微调。采用GANDiffFace框架生成的合成数据集——该新型框架能够合成具有可控人口分布与逼真类内变化的人脸识别数据集。我们考虑了用于训练和评估的多个代表不同人口群体的数据集。此外,对多种人脸识别系统进行微调,并使用不同指标评估其人口公平性。研究结果支持所提出的方法及合成数据在缓解人脸识别人口统计偏差中的应用。