Recently, there has been increased interest in fair generative models. In this work, we conduct, for the first time, an in-depth study on fairness measurement, a critical component in gauging progress on fair generative models. We make three contributions. First, we conduct a study that reveals that the existing fairness measurement framework has considerable measurement errors, even when highly accurate sensitive attribute (SA) classifiers are used. These findings cast doubts on previously reported fairness improvements. Second, to address this issue, we propose CLassifier Error-Aware Measurement (CLEAM), a new framework which uses a statistical model to account for inaccuracies in SA classifiers. Our proposed CLEAM reduces measurement errors significantly, e.g., 4.98% $\rightarrow$ 0.62% for StyleGAN2 w.r.t. Gender. Additionally, CLEAM achieves this with minimal additional overhead. Third, we utilize CLEAM to measure fairness in important text-to-image generator and GANs, revealing considerable biases in these models that raise concerns about their applications. Code and more resources: https://sutd-visual-computing-group.github.io/CLEAM/.
翻译:近期,公平生成模型引起了广泛关注。本研究首次对公平性度量这一评估公平生成模型进展的关键环节进行了深入探讨。我们做出三项贡献:第一,通过实证研究发现,即使使用高精度的敏感属性分类器,现有公平性度量框架仍存在显著测量误差,这一发现对先前报道的公平性改进结论提出了质疑。第二,为解决该问题,我们提出了分类器误差感知度量框架CLEAM,该框架通过统计模型校正敏感属性分类器的不准确性。实验表明,CLEAM能大幅降低测量误差,例如在StyleGAN2的性别属性上误差从4.98%降至0.62%,且仅需极低的额外计算开销。第三,我们利用CLEAM对重要文本到图像生成器及GAN模型进行公平性测量,揭示了这些模型中存在的显著偏差,引发对其应用场景的担忧。相关代码及资源:https://sutd-visual-computing-group.github.io/CLEAM/。