While Large Language Models achieve state-of-the-art results across a wide range of NLP tasks, they remain prone to systematic biases. Among these, gender bias is particularly salient in MT, due to systematic differences across languages in whether and how gender is marked. As a result, translation often requires disambiguating implicit source signals into explicit gender-marked forms. In this context, standard benchmarks may capture broad disparities but fail to reflect the full complexity of gender bias in modern MT. In this paper, we extend recent frameworks on bias evaluation by: (i) introducing a novel measure coined "Prior Bias", capturing a model's default gender assumptions, and (ii) applying the framework to decoder-only MT models. Our results show that, despite their scale and state-of-the-art status, decoder-only models do not generally outperform encoder-decoder architectures on gender-specific metrics; however, post-training (e.g., instruction tuning) not only improves contextual awareness but also reduces the masculine Prior Bias.
翻译:尽管大语言模型在广泛自然语言处理任务上达到了最先进的性能,但其仍容易受到系统性偏见的影响。其中,性别偏见在机器翻译中尤为突出,因为不同语言在性别标记的方式与存在性上存在系统性差异。因此,翻译通常需要将隐含的源语言信号消歧为显性的性别标记形式。在此背景下,标准基准测试虽能捕捉到宏观层面的差异,却未能反映现代机器翻译中性别偏见的全部复杂性。本文通过以下方式拓展了近期关于偏见评估的框架:(i) 引入一种名为“先验偏见”的新度量,用于捕捉模型默认的性别假设;(ii) 将该框架应用于仅解码器的机器翻译模型。结果表明,尽管仅解码器模型规模庞大且性能达到最先进水平,但在性别特定指标上通常并不优于编码器-解码器架构;然而,后训练(例如指令微调)不仅能提升上下文感知能力,还能降低男性先验偏见。