Deep generative models require large amounts of training data. This often poses a problem as the collection of datasets can be expensive and difficult, in particular datasets that are representative of the appropriate underlying distribution (e.g. demographic). This introduces biases in datasets which are further propagated in the models. We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN by rebalancing the model distribution. We do so by generating balanced data from an existing imbalanced deep generative model using an evolutionary algorithm and then using this data to train a balanced generative model. Additionally, we propose a bias mitigation loss function that minimizes the deviation of the learned class distribution from being equiprobable. We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness and see that the proposed approach improves on the fairness metric by almost 5 times, whilst maintaining image quality. We further validate our approach by applying it to an imbalanced CIFAR10 dataset where we show that we can obtain comparable fairness and image quality as when training on a balanced CIFAR10 dataset which is also twice as large. Lastly, we argue that the traditionally used image quality metrics such as Frechet inception distance (FID) are unsuitable for scenarios where the class distributions are imbalanced and a balanced reference set is not available.
翻译:深度生成模型需要大量训练数据。然而,数据集的收集往往成本高昂且困难,尤其是那些能代表潜在分布(如人口统计学分布)的数据集,这导致数据集中存在偏差,并进一步在模型中传播。我们提出了一种方法,通过重新平衡模型分布,从有偏的生成对抗网络(GAN)中构建无偏的GAN。具体做法是:利用演化算法从现有不平衡深度生成模型中生成平衡数据,再基于这些数据训练一个平衡生成模型。此外,我们提出了一种偏差缓解损失函数,该函数最小化所学类分布与等概率分布的偏差。我们在基于Flickr人脸高质量(FFHQ)数据集训练StyleGAN2模型以实现种族公平性时展示了结果,发现所提方法在保持图像质量的前提下,将公平性指标提升了近5倍。我们进一步将该方法应用于不平衡的CIFAR10数据集进行验证,结果表明,我们能够获得与在规模大一倍的平衡CIFAR10数据集上训练时相当的公平性和图像质量。最后,我们论证了传统图像质量指标(如Fréchet起始距离,FID)在类分布不平衡且缺乏平衡参考集的情况下并不适用。