Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from source languages or when pre-training data is limited in size. In this paper, we propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME including text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer.
翻译:多语言预训练语言模型在(零样本)跨语言迁移方面展现了显著的能力,然而当目标语言与源语言类型差异较大或预训练数据规模有限时,其性能会受到制约。本文提出XLM-P模型,该模型通过上下文检索提示作为灵活引导,实现条件性实例编码。我们的XLM-P支持:(1)跨语言的语言不变性与语言特异性知识的轻量化建模,以及(2)与其他多语言预训练方法的便捷集成。在涵盖文本分类、序列标注、问答和句子检索的XTREME任务上,采用本文方法预训练的基础规模与大语言模型均展现了持续的性能提升。此外,该方法在无监督句子检索的低资源语言场景,以及在跨语言迁移中与源语言差异显著的目标语言上,提供了实质性优势。