Despite major advances in multilingual modeling, large quality disparities persist across languages. Besides the obvious impact of uneven training resources, typological properties have also been proposed to determine the intrinsic difficulty of modeling a language. The existing evidence, however, is mostly based on small monolingual language models or bilingual translation models trained from scratch. We expand on this line of work by analyzing two large pre-trained multilingual translation models, NLLB-200 and Tower+, which are state-of-the-art representatives of encoder-decoder and decoder-only machine translation, respectively. Based on a broad set of languages, we find that target language typology drives translation quality of both models, even after controlling for more trivial factors, such as data resourcedness and writing script. Additionally, languages with certain typological properties benefit more from a wider search of the output space, suggesting that such languages could profit from alternative decoding strategies beyond the standard left-to-right beam search. To facilitate further research in this area, we release a set of fine-grained typological properties for 212 languages of the FLORES+ MT evaluation benchmark.
翻译:尽管多语言建模取得了重大进展,但不同语言之间仍存在巨大的质量差异。除了不平等的训练资源带来的明显影响外,类型学特性也被认为是决定语言建模内在难度的因素。然而,现有的证据大多基于小型单语语言模型或从头开始训练的双语翻译模型。我们扩展了这项工作,分析了两个大型预训练多语言翻译模型:NLLB-200 和 Tower+,它们分别是编码器-解码器和仅解码器机器翻译的最新代表。基于广泛的语言集合,我们发现目标语言的类型学驱动着两个模型的翻译质量,即使在控制了更琐碎的因素(如数据资源丰富度和书写文字)之后也是如此。此外,具有某些类型学特性的语言从更广泛的输出空间搜索中获益更多,这表明这些语言可能受益于标准从左到右束搜索之外的替代解码策略。为了促进该领域的进一步研究,我们发布了 FLORES+ 机器翻译评估基准中 212 种语言的细粒度类型学属性集。