The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a distorted worldview and limit opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) biased direct finetuning of diffusion model's sampling process, which leverages a biased gradient to more effectively optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We hope our work facilitates the social alignment of T2I generative AI. We will share code and various debiased diffusion model adaptors.
翻译:文本到图像扩散模型在社会中的快速普及凸显了解决其偏见的迫切需求。若无干预措施,这些偏见可能传播扭曲的世界观并限制少数群体的机会。本文将公平性定义为分布对齐问题,主要包含两项技术贡献:(1)一种分布对齐损失函数,可将生成图像的特定特征导向用户定义的目标分布;(2)针对扩散模型采样过程的有偏直接微调方法,通过利用有偏梯度更有效地优化基于生成图像定义的损失。实验表明,该方法在职业提示词上显著降低了性别、种族及其交叉偏见。即使仅微调五个软令牌,性别偏见也被大幅削弱。值得注意的是,本方法支持超越绝对平等的多元化公平视角,例如在同时消除性别与种族偏见时,可将年龄控制为75%年轻与25%年长的分布。此外,该方法具有可扩展性:仅需将多个提示词纳入微调数据即可同步消除多重概念偏见。我们希望本工作能促进文本到图像生成式人工智能的社会对齐,并将公开代码及多种去偏扩散模型适配器。