Toward responsible face datasets: modeling the distribution of a disentangled latent space for sampling face images from demographic groups

Recently, it has been exposed that some modern facial recognition systems could discriminate specific demographic groups and may lead to unfair attention with respect to various facial attributes such as gender and origin. The main reason are the biases inside datasets, unbalanced demographics, used to train theses models. Unfortunately, collecting a large-scale balanced dataset with respect to various demographics is impracticable. In this paper, we investigate as an alternative the generation of a balanced and possibly bias-free synthetic dataset that could be used to train, to regularize or to evaluate deep learning-based facial recognition models. We propose to use a simple method for modeling and sampling a disentangled projection of a StyleGAN latent space to generate any combination of demographic groups (e.g. $hispanic-female$). Our experiments show that we can synthesis any combination of demographic groups effectively and the identities are different from the original training dataset. We also released the source code.

翻译：近期研究表明，部分现代人脸识别系统可能对特定人口群体产生歧视，并导致对性别、种族等面部属性的不公平关注。其根本原因是训练这些模型的数据集存在偏差与非均衡的人口统计分布。然而，构建覆盖各类人口统计属性的均衡大规模数据集在实践中难以实现。本文探索替代方案，提出通过生成均衡且可能消除偏差的合成数据集，用于训练、正则化或评估基于深度学习的人脸识别模型。我们采用一种简化的方法，对StyleGAN潜空间中的解耦投影进行建模与采样，以生成任意人口群体组合（例如$hispanic-female$）。实验结果表明，该方法能有效合成各类人口群体组合，且生成样本的身份特征与原始训练数据集截然不同。我们已同步公开源代码。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日