Bilingual Lexicon Induction (BLI), where words are translated between two languages, is an important NLP task. While noticeable progress on BLI in rich resource languages using static word embeddings has been achieved. The word translation performance can be further improved by incorporating information from contextualized word embeddings. In this paper, we introduce ProMap, a novel approach for BLI that leverages the power of prompting pretrained multilingual and multidialectal language models to address these challenges. To overcome the employment of subword tokens in these models, ProMap relies on an effective padded prompting of language models with a seed dictionary that achieves good performance when used independently. We also demonstrate the effectiveness of ProMap in re-ranking results from other BLI methods such as with aligned static word embeddings. When evaluated on both rich-resource and low-resource languages, ProMap consistently achieves state-of-the-art results. Furthermore, ProMap enables strong performance in few-shot scenarios (even with less than 10 training examples), making it a valuable tool for low-resource language translation. Overall, we believe our method offers both exciting and promising direction for BLI in general and low-resource languages in particular. ProMap code and data are available at \url{https://github.com/4mekki4/promap}.
翻译:双语词汇表归纳(BLI)是一项重要的自然语言处理任务,旨在实现跨语言单词翻译。尽管利用静态词嵌入在富资源语言上的BLI已取得显著进展,但结合上下文词嵌入中的信息可进一步提升词翻译性能。本文提出ProMap,一种利用预训练多语言及多方言语言模型提示能力应对上述挑战的新型BLI方法。为解决这些模型中子词标记的使用问题,ProMap依赖一种有效的填充式提示策略,结合独立使用时性能良好的种子词典。我们同时证明了ProMap在其他BLI方法(如对齐静态词嵌入)结果重排序中的有效性。在富资源与低资源语言上的评估表明,ProMap持续取得最先进的结果。此外,ProMap在少样本场景下(甚至少于10个训练样本)展现出强大性能,使其成为低资源语言翻译的重要工具。总体而言,我们认为该方法为通用BLI(尤其是低资源语言)提供了令人振奋且极具前景的研究方向。ProMap代码与数据已公开于\url{https://github.com/4mekki4/promap}。