Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
翻译:模块化深度学习被提出用于高效地将预训练模型适配到新任务、领域和语言。特别是,将语言适配器与任务适配器相结合,在缺乏某种语言监督数据的情况下显示出潜力。本文探讨了语言适配器在自然语言理解(NLU)基准的零样本跨语言迁移中的作用。我们通过两个多语言模型和三个多语言数据集进行详细的消融研究,考察了包含目标语言适配器的影响。结果表明,目标语言适配器的效果在任务、语言和模型之间高度不一致。保留源语言适配器往往能达到同等甚至更好的性能。训练后移除语言适配器仅产生微弱的负面效应,这表明语言适配器对预测结果的影响并不显著。