Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslation, and such errors can be especially harmful. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally under-represented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both neural machine translation systems and foundation models, and show that all systems exhibit gender mistranslation and potential harm, even in high resource languages.
翻译:包括具备翻译能力的基础模型在内的翻译系统,可能产生导致性别误译的错误,此类错误尤其有害。为了衡量在翻译进出英语时此类潜在危害的程度,我们引入了一个数据集 MiTTenS,涵盖来自多种语系和文字体系的 26 种语言,其中包括几种在数字资源中传统上代表性不足的语言。该数据集由针对已知失败模式手工构建的段落、较长的合成生成段落以及来自多个领域的自然段落构成。我们通过评估神经机器翻译系统和基础模型,证明了该数据集的实用性,并表明所有系统都存在性别误译和潜在危害,即使在资源丰富的语言中也是如此。