Uncovering Bias in Face Generation Models

Recent advancements in GANs and diffusion models have enabled the creation of high-resolution, hyper-realistic images. However, these models may misrepresent certain social groups and present bias. Understanding bias in these models remains an important research question, especially for tasks that support critical decision-making and could affect minorities. The contribution of this work is a novel analysis covering architectures and embedding spaces for fine-grained understanding of bias over three approaches: generators, attribute modifier, and post-processing bias mitigators. This work shows that generators suffer from bias across all social groups with attribute preferences such as between 75%-85% for whiteness and 60%-80% for the female gender (for all trained CelebA models) and low probabilities of generating children and older men. Modifier and mitigators work as post-processor and change the generator performance. For instance, attribute channel perturbation strategies modify the embedding spaces. We quantify the influence of this change on group fairness by measuring the impact on image quality and group features. Specifically, we use the Fr\'echet Inception Distance (FID), the Face Matching Error and the Self-Similarity score. For Interfacegan, we analyze one and two attribute channel perturbations and examine the effect on the fairness distribution and the quality of the image. Finally, we analyzed the post-processing bias mitigators, which are the fastest and most computationally efficient way to mitigate bias. We find that these mitigation techniques show similar results on KL divergence and FID score, however, self-similarity scores show a different feature concentration on the new groups of the data distribution. The weaknesses and ongoing challenges described in this work must be considered in the pursuit of creating fair and unbiased face generation models.

翻译：生成对抗网络（GANs）与扩散模型的最新进展已能生成高分辨率、超逼真图像。然而，这些模型可能对某些社会群体产生误判并呈现偏见。理解模型中的偏见仍是重要研究课题，尤其是对于支持关键决策且影响弱势群体的任务而言。本工作的贡献在于提出一种新颖的分析方法，涵盖架构与嵌入空间，通过生成器、属性修正器与后处理偏见缓解三类方案实现偏见的细粒度理解。研究表明，生成器在所有社会群体中均存在偏见，表现为属性偏好（例如所有训练后的CelebA模型对白种人偏好75%-85%，对女性性别偏好60%-80%），且生成儿童与老年男性的概率较低。修正器与缓解器作为后处理单元，可改变生成器性能。例如，属性通道扰动策略会修改嵌入空间。我们通过测量图像质量与群体特征的影响，量化此类变化对群体公平性的作用。具体采用弗雷歇初始距离（FID）、人脸匹配误差与自相似性评分进行评估。针对InterfaceGAN，我们分析单通道与双通道属性扰动，考察其对公平分布与图像质量的影响。最后，我们分析后处理偏见缓解器，这是最快且计算效率最高的偏见消除方法。研究发现，这些缓解技术在KL散度与FID评分上呈现相似结果，但自相似性评分显示新数据分布群体具有不同的特征集中程度。本工作所揭示的缺陷与持续挑战，在追求创建公正无偏的人脸生成模型过程中必须予以重视。