Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Self-supervised pretrained models are often fine-tuned on parallel data from one or multiple language pairs for machine translation. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. Training a new adapter on each language pair or training a single adapter on all language pairs without updating the pretrained model has been proposed as a parameter-efficient alternative. However, the former does not permit any sharing between languages, while the latter shares parameters for all languages and is susceptible to negative interference. In this paper, we propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer. Our approach outperforms related baselines, yielding higher translation scores on average when translating from English to 17 different low-resource languages. We also show that language-family adapters provide an effective method to translate to languages unseen during pretraining.
翻译:采用自监督训练的大型多语言模型在广泛的自然语言处理任务中取得了最先进的成果。自监督预训练模型通常在一个或多个语言对的并行数据上进行微调以用于机器翻译。多语言微调提高了低资源语言的性能,但需要修改整个模型,且成本可能过高。为每个语言对训练一个新的适配器,或在所有语言对上训练一个单一适配器而不更新预训练模型,已被提出作为参数高效的替代方案。然而,前者不允许语言之间的任何共享,而后者则为所有语言共享参数,容易受到负面干扰。在本文中,我们提出在mBART-50之上训练语言族适配器以促进跨语言迁移。我们的方法优于相关基线,在从英语翻译到17种不同的低资源语言时,平均翻译得分更高。我们还表明,语言族适配器为翻译预训练期间未见过的语言提供了一种有效方法。