FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation

Recently, prompt learning has emerged as the state-of-the-art (SOTA) for fair text-to-image (T2I) generation. Specifically, this approach leverages readily available reference images to learn inclusive prompts for each target Sensitive Attribute (tSA), allowing for fair image generation. In this work, we first reveal that this prompt learning-based approach results in degraded sample quality. Our analysis shows that the approach's training objective -- which aims to align the embedding differences of learned prompts and reference images -- could be sub-optimal, resulting in distortion of the learned prompts and degraded generated images. To further substantiate this claim, as our major contribution, we deep dive into the denoising subnetwork of the T2I model to track down the effect of these learned prompts by analyzing the cross-attention maps. In our analysis, we propose a novel prompt switching analysis: I2H and H2I. Furthermore, we propose new quantitative characterization of cross-attention maps. Our analysis reveals abnormalities in the early denoising steps, perpetuating improper global structure that results in degradation in the generated samples. Building on insights from our analysis, we propose two ideas: (i) Prompt Queuing and (ii) Attention Amplification to address the quality issue. Extensive experimental results on a wide range of tSAs show that our proposed method outperforms SOTA approach's image generation quality, while achieving competitive fairness. More resources at FairQueue Project site: https://sutd-visual-computing-group.github.io/FairQueue

翻译：近年来，提示学习已成为实现公平文本到图像（T2I）生成的最先进方法。具体而言，该方法利用现成的参考图像为每个目标敏感属性学习包容性提示，从而实现公平的图像生成。在本工作中，我们首先揭示了这种基于提示学习的方法会导致样本质量下降。我们的分析表明，该方法的训练目标——旨在对齐学习提示与参考图像的嵌入差异——可能并非最优，导致学习提示的扭曲和生成图像的质量退化。为进一步证实这一观点，作为我们的主要贡献，我们深入探究了T2I模型的去噪子网络，通过分析交叉注意力图来追踪这些学习提示的影响。在分析中，我们提出了一种新颖的提示切换分析框架：I2H与H2I。此外，我们提出了交叉注意力图的量化表征新方法。分析揭示了早期去噪步骤中的异常现象，这些异常会延续不正确的全局结构，导致生成样本的质量下降。基于分析所得洞见，我们提出两项改进思路：（i）提示队列与（ii）注意力增强，以解决质量问题。在广泛目标敏感属性上的大量实验结果表明，我们提出的方法在图像生成质量上超越了最先进方法，同时保持了具有竞争力的公平性。更多资源请访问FairQueue项目网站：https://sutd-visual-computing-group.github.io/FairQueue