Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks such as translation and multilingual word sense disambiguation (WSD). However, they often struggle at disambiguating word sense in a zero-shot setting. To better understand this contrast, we present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT), an extension of word-level translation that prompts the model to translate a given word in context. We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance. Building on C-WLT, we introduce a zero-shot approach for WSD, tested on 18 languages from the XL-WSD dataset. Our method outperforms fully supervised baselines on recall for many evaluation languages without additional training or finetuning. This study presents a first step towards understanding how to best leverage the cross-lingual knowledge inside PLMs for robust zero-shot reasoning in any language.
翻译:预训练语言模型(PLMs)学习了丰富的跨语言知识,并可通过微调在翻译、多语言词义消歧(WSD)等多种任务上取得良好性能。然而,它们在零样本场景下往往难以有效消解词义。为深入理解这一差异,我们提出了一项新研究,通过上下文词级翻译(C-WLT,词级翻译的扩展形式,促使模型翻译给定上下文中的特定词语),探究PLMs如何捕捉跨语言词义。研究发现,随着模型规模增大,PLMs编码了更多跨语言词义知识,并更擅长利用上下文提升C-WLT性能。基于C-WLT,我们提出了一种零样本WSD方法,在XL-WSD数据集的18种语言上进行了测试。无需额外训练或微调,该方法在多个评估语言的召回率指标上超越了完全监督基线。本研究为理解如何最优利用PLMs中的跨语言知识实现鲁棒的零样本跨语言推理迈出了第一步。