Zero-shot cross-lingual generation implies finetuning of the multilingual pretrained language model on a generation task in one language and then using it to make predictions for this task in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline and alternative approaches bring only marginal improvements. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. Our final models reach the performance of the approach based on data translation which is usually considered as an upper baseline for zero-shot cross-lingual generation.
翻译:零样本跨语言生成指在多语言预训练语言模型上对某一语言的生成任务进行微调,然后将其用于其他语言的任务预测。先前研究发现存在生成语言错误这一常见问题,并提出多种解决方案,通常以mT5作为骨干模型。本研究在统一设置下比较了文献中提出的多种方法,同时引入替代骨干模型,即mBART和NLLB-200。我们首先强调调整用于微调的学习率的重要性,这有助于显著缓解生成语言错误的问题。随后表明,通过精细的学习率调整,简单的全模型微调可作为强大的基线方法,而其他替代方法仅能带来微小的改进。最后发现,mBART的性能与同等规模的mT5相当,而NLLB-200在某些场景下具有竞争力。我们的最终模型达到了基于数据翻译方法的性能水平,该方法通常被视为零样本跨语言生成的上限基线。