Warning: this work contains upsetting or disturbing content. Large language models (LLMs) tend to learn the social and cultural biases present in the raw pre-training data. To test if an LLM's behavior is fair, functional datasets are employed, and due to their purpose, these datasets are highly language and culture-specific. In this paper, we address a gap in the scope of multilingual bias evaluation by presenting a bias detection dataset specifically designed for the Russian language, dubbed as RuBia. The RuBia dataset is divided into 4 domains: gender, nationality, socio-economic status, and diverse, each of the domains is further divided into multiple fine-grained subdomains. Every example in the dataset consists of two sentences with the first reinforcing a potentially harmful stereotype or trope and the second contradicting it. These sentence pairs were first written by volunteers and then validated by native-speaking crowdsourcing workers. Overall, there are nearly 2,000 unique sentence pairs spread over 19 subdomains in RuBia. To illustrate the dataset's purpose, we conduct a diagnostic evaluation of state-of-the-art or near-state-of-the-art LLMs and discuss the LLMs' predisposition to social biases.
翻译:警告:本作品包含令人不安或不适的内容。大型语言模型(LLMs)倾向于学习原始预训练数据中存在的社会和文化偏见。为测试LLM的行为是否公平,需要使用功能性数据集,而由于其目的,这些数据集具有高度语言和文化特异性。本文通过提出一个专门针对俄语的偏见检测数据集(称为RuBia),填补了多语言偏见评估范围的空白。RuBia数据集分为4个领域:性别、国籍、社会经济状况和其他,每个领域进一步细分为多个细粒度子领域。数据集中每个示例由两个句子组成,第一个句子强化可能有害的刻板印象或套路,第二个句子则反驳之。这些句子对首先由志愿者撰写,然后由母语众包工作者验证。总体而言,RuBia数据集包含近2000个独特的句子对,分布在19个子领域中。为说明数据集的作用,我们对最先进或接近最先进的LLMs进行了诊断评估,并讨论了LLMs对社会偏见的倾向性。