Recent instruction fine-tuned models can solve multiple NLP tasks when prompted to do so, with machine translation (MT) being a prominent use case. However, current research often focuses on standard performance benchmarks, leaving compelling fairness and ethical considerations behind. In MT, this might lead to misgendered translations, resulting, among other harms, in the perpetuation of stereotypes and prejudices. In this work, we address this gap by investigating whether and to what extent such models exhibit gender bias in machine translation and how we can mitigate it. Concretely, we compute established gender bias metrics on the WinoMT corpus from English to German and Spanish. We discover that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. Next, using interpretability methods, we unveil that models systematically overlook the pronoun indicating the gender of a target occupation in misgendered translations. Finally, based on this finding, we propose an easy-to-implement and effective bias mitigation solution based on few-shot learning that leads to significantly fairer translations.
翻译:近期指令调优模型能够在提示下解决多项自然语言处理任务,其中机器翻译是一项突出应用。然而,当前研究往往聚焦于标准性能基准,忽视了公平性与伦理等关键问题。在机器翻译中,这可能导致译文出现性别误指,进而造成刻板印象与偏见的固化等危害。本研究旨在探讨此类模型在机器翻译中是否存在性别偏见、偏见程度如何以及如何缓解偏见。具体而言,我们在WinoMT语料库上从英语到德语和西班牙语方向计算了既定的性别偏见指标,发现指令调优模型默认生成阳性屈折译文,甚至忽略女性职业刻板印象。接着,我们运用可解释性方法揭示:在性别误指的译文中,模型系统性地忽略了指示目标职业性别的代词。基于这一发现,我们提出了一种基于少样本学习的简易且有效的偏见缓解方案,能够显著提升译文的公平性。