Bilingual Lexicon Induction (BLI) is a core task in multilingual NLP that still, to a large extent, relies on calculating cross-lingual word representations. Inspired by the global paradigm shift in NLP towards Large Language Models (LLMs), we examine the potential of the latest generation of LLMs for the development of bilingual lexicons. We ask the following research question: Is it possible to prompt and fine-tune multilingual LLMs (mLLMs) for BLI, and how does this approach compare against and complement current BLI approaches? To this end, we systematically study 1) zero-shot prompting for unsupervised BLI and 2) few-shot in-context prompting with a set of seed translation pairs, both without any LLM fine-tuning, as well as 3) standard BLI-oriented fine-tuning of smaller LLMs. We experiment with 18 open-source text-to-text mLLMs of different sizes (from 0.3B to 13B parameters) on two standard BLI benchmarks covering a range of typologically diverse languages. Our work is the first to demonstrate strong BLI capabilities of text-to-text mLLMs. The results reveal that few-shot prompting with in-context examples from nearest neighbours achieves the best performance, establishing new state-of-the-art BLI scores for many language pairs. We also conduct a series of in-depth analyses and ablation studies, providing more insights on BLI with (m)LLMs, also along with their limitations.
翻译:双语词汇归纳(BLI)是多语言自然语言处理中的核心任务,至今仍很大程度上依赖于跨语言词向量的计算。受自然语言处理领域向大型语言模型(LLMs)全球性范式转变的启发,我们探究了最新一代LLMs在构建双语词汇表方面的潜力。我们提出以下研究问题:能否通过提示工程和微调多语言LLMs(mLLMs)来完成BLI任务?这种方法与现有BLI方法相比如何,又能否形成互补?为此,我们系统研究了:1)面向无监督BLI的零样本提示方法;2)基于种子翻译对集合的少样本上下文内提示方法(两者均未涉及LLM微调);3)面向标准BLI任务的小型LLM微调。我们采用18个不同参数量(3亿至130亿)的开源文本到文本mLLMs,在覆盖多种类型学语言的两个标准BLI基准上开展实验。本研究首次证明了文本到文本mLLMs具备强大的BLI能力。结果表明,采用最近邻上下文示例的少样本提示方法取得了最佳性能,为众多语言对树立了新的BLI评分标杆。我们还通过一系列深度分析和消融实验,从(多语言)LLMs视角为BLI提供了更深入的见解,同时指出了其局限性。