Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
翻译:近期研究已展示了如何评估预训练英语语言模型中的刻板印象偏见。本研究通过系统性地考察(a)不同底层架构的(b)单语与多语言模型在(c)多种不同语言中的偏见表现,将这一研究方向拓展至多个维度。为此,我们采用英文StereoSet数据集(Nadeem等人,2021),通过半自动化方式将其翻译为德语、法语、西班牙语和土耳其语。实验结果表明,在多语言环境下开展此类分析至关重要——我们的实验结果展现出更为细致的图景,且与纯英文分析存在显著差异。分析的主要发现包括:mGPT-2(部分)在不同语言中呈现出出人意料的反刻板印象倾向;英文(单语)模型表现出最强烈的偏见;数据集所反映的刻板印象在土耳其语模型中表现最弱。最后,我们公开发布了代码库、翻译数据集及半自动化翻译实践指南,以鼓励将该研究进一步推广至其他语言。