The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.
翻译:多语言语言模型成功适配特定语言-任务对,关键取决于针对该条件的数据可用性。尽管跨语言迁移方法有助于解决这种数据稀缺问题,但其有效性背后的机制仍存在争议。本研究聚焦于跨语言迁移内部运作机制的一个有前景假设:它促使多语言语言模型更关注语言无关或任务特定特征。我们通过考察跨语言迁移模式随过程中源语言数量变化的情况来验证此假设。实验结果表明,在跨语言迁移中使用多种源语言——我们将此技术称为多源语言训练——会导致不同语言嵌入空间的混合度增加,这支持了跨语言迁移受益于利用语言无关信息的观点。另一方面,我们发现任意组合源语言并不总能保证更优性能。我们提出了识别多源语言训练有效语言组合的简单启发式方法,并通过实验证明了其有效性。