Zero-shot cross-lingual knowledge transfer enables the multilingual pretrained language model (mPLM), finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, considering full finetuning and parameter-efficient finetuning with adapters. We find that mBART with adapters performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. We also underline the importance of tuning learning rate used for finetuning, which helps to alleviate the problem of generation in the wrong language.
翻译:零样本跨语言知识迁移使得在一种语言上微调的多语言预训练语言模型(mPLM)能够对该任务在其他语言上进行预测。尽管该设置在自然语言理解任务中得到了广泛研究,但在生成任务中的研究仍相对不足。以往研究注意到生成语言不匹配的常见问题,并提出了相应的解决思路,通常以mT5作为骨干模型。本研究测试了替代性mPLM,如mBART和NLLB-200,并考虑了全微调以及基于适配器的参数高效微调。我们发现,使用适配器的mBART与同等规模的mT5表现相似,而NLLB-200在某些情况下具有竞争力。我们还强调了调整微调学习率的重要性,这有助于缓解生成语言不匹配的问题。