Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment, and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this technology. However, our results show that multilingual models suffer from significant gender biases just as monolingual models do. Furthermore, the natural expectation that multilingual models will provide similar results across languages does not hold up. Instead, there are important differences between languages. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. We use MAGBIG to investigate the effect of multilingualism on gender bias in T2I models. To this end, we construct multilingual prompts requesting portraits of people with a certain occupation or trait. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages. Furthermore, we investigate prompt engineering strategies, such as indirect, neutral formulations, to mitigate these biases. Unfortunately, these approaches have limited success and result in worse text-to-image alignment. Consequently, we call for more research into diverse representations across languages in image generators, as well as into steerability to address biased model behavior.
翻译:文本到图像生成模型近期在图像质量、灵活性和文本对齐方面取得了令人瞩目的成果,因此被广泛应用于快速增长的应用场景中。通过多语言能力的提升,更多社区用户现在能够接触这项技术。然而,我们的结果表明,多语言模型与单语言模型一样存在严重的性别偏见。此外,多语言模型在不同语言间提供相似结果的自然预期并不成立,反而存在显著的语言差异。我们提出一个新的基准数据集MAGBIG,旨在促进多语言模型中性别偏见的研究。我们利用MAGBIG探究多语言性对文本到图像生成模型中性别偏见的影响。为此,我们构建了描述特定职业或特征人物肖像的多语言提示词。结果显示,模型不仅表现出强烈的性别偏见,而且在不同语言中的行为也存在差异。此外,我们研究了间接、中性表达等提示工程策略以缓解这些偏见。遗憾的是,这些方法效果有限,反而导致文本到图像对齐效果下降。因此,我们呼吁加强对图像生成器中不同语言多样表征的研究,以及通过可操控性来解决模型的有偏行为。