Large multilingual language models such as mBERT or XLM-R enable zero-shot cross-lingual transfer in various IR and NLP tasks. Cao et al. (2020) proposed a data- and compute-efficient method for cross-lingual adjustment of mBERT that uses a small parallel corpus to make embeddings of related words across languages similar to each other. They showed it to be effective in NLI for five European languages. In contrast we experiment with a typologically diverse set of languages (Spanish, Russian, Vietnamese, and Hindi) and extend their original implementations to new tasks (XSR, NER, and QA) and an additional training regime (continual learning). Our study reproduced gains in NLI for four languages, showed improved NER, XSR, and cross-lingual QA results in three languages (though some cross-lingual QA gains were not statistically significant), while mono-lingual QA performance never improved and sometimes degraded. Analysis of distances between contextualized embeddings of related and unrelated words (across languages) showed that fine-tuning leads to "forgetting" some of the cross-lingual alignment information. Based on this observation, we further improved NLI performance using continual learning.
翻译:大型多语言语言模型(如mBERT或XLM-R)在多种信息检索和自然语言处理任务中实现了零样本跨语言迁移。Cao等人(2020)提出了一种数据与计算高效的mBERT跨语言调整方法,利用小型平行语料库使跨语言相关词的嵌入表示彼此相似。他们证明该方法在五种欧洲语言的NLI任务中有效。相比之下,我们采用类型学多样化的语言集(西班牙语、俄语、越南语和印地语)进行实验,并将其原始实现扩展到新任务(XSR、NER和QA)以及另一种训练范式(持续学习)。我们的研究在四种语言的NLI任务中复现了性能提升,展示了三种语言在NER、XSR和跨语言QA任务上的改进(尽管部分跨语言QA的提升未达统计显著性),而单语言QA性能从未提升甚至出现下降。对跨语言相关词与无关词语境化嵌入距离的分析表明,微调会导致部分跨语言对齐信息被"遗忘"。基于这一发现,我们进一步采用持续学习策略优化了NLI性能。