Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this kind of technology. Yet, as we will show, multilingual models suffer similarly from (gender) biases as monolingual models. Furthermore, the natural expectation is that these models will provide similar results across languages, but this is not the case and there are important differences between languages. Thus, we propose a novel benchmark MAGBIG intending to foster research in multilingual models without gender bias. We investigate whether multilingual T2I models magnify gender bias with MAGBIG. To this end, we use multilingual prompts requesting portrait images of persons of a certain occupation or trait (using adjectives). Our results show not only that models deviate from the normative assumption that each gender should be equally likely to be generated, but that there are also big differences across languages. Furthermore, we investigate prompt engineering strategies, i.e. the use of indirect, neutral formulations, as a possible remedy for these biases. Unfortunately, they help only to a limited extent and result in worse text-to-image alignment. Consequently, this work calls for more research into diverse representations across languages in image generators.
翻译:文本到图像生成模型近年来在图像质量、灵活性和文本对齐方面取得了惊人成果,因此被广泛应用于快速增长的应用场景中。随着多语言能力的提升,更多用户现在能够接触到这类技术。然而,我们将证明,多语言模型与单语模型一样存在(性别)偏见问题。此外,人们自然期望这些模型在不同语言中提供相似的结果,但实际情况并非如此,不同语言之间存在显著差异。为此,我们提出了一个新的基准MAGBIG,旨在促进无性别偏见的多语言模型研究。我们利用MAGBIG探究多语言文本到图像(T2I)模型是否放大了性别偏见。具体而言,我们使用多语言提示,要求生成特定职业或特质(通过形容词描述)的人物肖像图像。我们的结果不仅表明模型偏离了每种性别应具有同等生成概率的规范性假设,还揭示了不同语言之间存在巨大差异。此外,我们研究了提示工程策略(即使用间接、中性的表述)作为缓解这些偏见的可能手段。遗憾的是,这些策略仅在有限程度上有效,并导致文本到图像对齐效果更差。因此,这项工作呼吁对图像生成器中跨语言的多样化表征进行更多研究。