How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning

Multilingual large language models (MLLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages' data. Impressive performance on zero-shot cross-lingual transfer shows that these models are capable of exploiting data from other languages. Yet, it remains unclear to what extent, and under which conditions, languages rely on each other's data. In this study, we use TracIn (Pruthi et al., 2020), a training data attribution (TDA) method, to retrieve the most influential training samples seen during multilingual fine-tuning for a particular test language. This allows us to analyse cross-lingual sharing mechanisms of MLLMs from a new perspective. While previous work studied cross-lingual sharing at the level of model parameters, we present the first approach to study cross-lingual sharing at the data level. We find that MLLMs rely on data from multiple languages from the early stages of fine-tuning and that this reliance gradually increases as fine-tuning progresses. We further study how different fine-tuning languages influence model performance on a given test language and find that they can both reinforce and complement the knowledge acquired from data of the test language itself.

翻译：多语言大语言模型（MLLMs）通过联合训练来自多种不同语言的数据，使得单个语言的表征能够受益于其他语言的数据。这些模型在零样本跨语言迁移中展现出的卓越性能表明，它们能够有效利用其他语言的数据。然而，在多大程度上以及何种条件下语言会依赖彼此的数据，目前仍不明确。本研究采用训练数据归因方法TracIn（Pruthi等人，2020），从多语言微调过程中提取对特定测试语言最具影响力的训练样本。这使我们能够从全新视角分析MLLMs的跨语言共享机制。此前研究主要在模型参数层面探讨跨语言共享，而本研究首次提出了在数据层面研究跨语言共享的方法。我们发现，MLLMs从微调初期就开始依赖多语言数据，并且这种依赖程度随着微调过程的推进逐步增强。我们进一步研究了不同微调语言对给定测试语言模型性能的影响，发现这些语言既能强化也能补充从测试语言自身数据中获取的知识。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日