Generative adversarial networks (GANs) generate photorealistic faces that are often indistinguishable by humans from real faces. While biases in machine learning models are often assumed to be due to biases in training data, we find pathological internal color and luminance biases in the discriminator of a pre-trained StyleGAN3-r model that are not explicable by the training data. We also find that the discriminator systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine axes common in research on stereotyping in social psychology.
翻译:生成对抗网络(GANs)能够生成在人类眼中与真实人脸难以区分的光真实感人脸图像。尽管机器学习模型中的偏见通常被认为源于训练数据的偏见,但我们发现预训练的StyleGAN3-r模型判别器中存在无法由训练数据解释的病态内部颜色和亮度偏见。同时,我们发现该判别器依据图像级和人脸级质量对分数进行系统性分层,且这一分层对性别、种族及其他类别的图像产生了不成比例的影响。我们考察了社会心理学刻板印象研究中常见的分析维度。