As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability from the source language or the language ability from the chosen task. In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks. As the gap removes the impact of tasks, we assume that it remains consistent across tasks. Based on this assumption, we propose a new cross-lingual transfer method called $\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.
翻译:作为在特定语言中对目标任务进行直接微调的有效替代方案,跨语言迁移通过分别对源语言目标任务和另一种语言中选定任务进行微调,将"任务能力"与"语言能力"解耦,从而应对训练数据有限的挑战。然而,现有方法未能完全将任务能力从源语言中剥离,也未能彻底分离所选任务的语言能力。本文承认任务能力与语言能力之间存在相互依存关系,并将研究聚焦于目标任务上目标语言与源语言之间的差距。由于该差距消除了任务的影响,我们假设其在不同任务间保持一致。基于这一假设,我们提出一种名为$\texttt{AdaMergeX}$的跨语言迁移方法,该方法利用自适应适配器融合技术。通过引入参考任务,我们可确定参考任务上两种语言的适配器差异分布,与目标任务上两种语言的适配器差异分布遵循相同规律。因此,我们可通过合并其他三个适配器来获得目标任务适配器。此外,我们还提出一种结构自适应的适配器融合方法。实验结果表明,本方法实现了新颖且有效的跨语言迁移,在所有设置下均优于现有方法。