Recently, it has been exposed that some modern facial recognition systems could discriminate specific demographic groups and may lead to unfair attention with respect to various facial attributes such as gender and origin. The main reason are the biases inside datasets, unbalanced demographics, used to train theses models. Unfortunately, collecting a large-scale balanced dataset with respect to various demographics is impracticable. In this paper, we investigate as an alternative the generation of a balanced and possibly bias-free synthetic dataset that could be used to train, to regularize or to evaluate deep learning-based facial recognition models. We propose to use a simple method for modeling and sampling a disentangled projection of a StyleGAN latent space to generate any combination of demographic groups (e.g. $hispanic-female$). Our experiments show that we can synthesis any combination of demographic groups effectively and the identities are different from the original training dataset. We also released the source code.
翻译:近期研究表明,部分现代人脸识别系统可能对特定人口群体产生歧视,并导致对性别、种族等面部属性的不公平关注。其根本原因是训练这些模型的数据集存在偏差与非均衡的人口统计分布。然而,构建覆盖各类人口统计属性的均衡大规模数据集在实践中难以实现。本文探索替代方案,提出通过生成均衡且可能消除偏差的合成数据集,用于训练、正则化或评估基于深度学习的人脸识别模型。我们采用一种简化的方法,对StyleGAN潜空间中的解耦投影进行建模与采样,以生成任意人口群体组合(例如$hispanic-female$)。实验结果表明,该方法能有效合成各类人口群体组合,且生成样本的身份特征与原始训练数据集截然不同。我们已同步公开源代码。