Mitigating biases in generative AI and, particularly in text-to-image models, is of high importance given their growing implications in society. The biased datasets used for training pose challenges in ensuring the responsible development of these models, and mitigation through hard prompting or embedding alteration, are the most common present solutions. Our work introduces a novel approach to achieve diverse and inclusive synthetic images by learning a direction in the latent space and solely modifying the initial Gaussian noise provided for the diffusion process. Maintaining a neutral prompt and untouched embeddings, this approach successfully adapts to diverse debiasing scenarios, such as geographical biases. Moreover, our work proves it is possible to linearly combine these learned latent directions to introduce new mitigations, and if desired, integrate it with text embedding adjustments. Furthermore, text-to-image models lack transparency for assessing bias in outputs, unless visually inspected. Thus, we provide a tool to empower developers to select their desired concepts to mitigate. The project page with code is available online.
翻译:鉴于生成式人工智能(特别是文本到图像模型)在社会中的影响日益扩大,缓解其偏见至关重要。用于训练的偏见数据集给确保这些模型负责任的发展带来了挑战,而通过硬提示或嵌入调整进行缓解是目前最常见的解决方案。我们的工作提出了一种新颖方法,通过学习潜在空间中的一个方向并仅修改为扩散过程提供的初始高斯噪声,来实现多样化和包容性的合成图像生成。该方法保持中性提示词和未经修改的嵌入,成功适应了多种去偏见场景(如地理偏见)。此外,我们的研究证明可以线性组合这些习得的潜在方向以引入新的缓解措施,并可根据需要与文本嵌入调整相结合。进一步地,文本到图像模型缺乏评估输出偏见的透明度,除非进行人工视觉检查。因此,我们提供了一个工具,使开发者能够自主选择需要缓解的目标概念。项目页面及代码已在线发布。