Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x ChrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.
翻译:摘要:大语言模型(LLMs)在无需平行语料训练的情况下,已在多语言神经机器翻译(MNMT)中展现出令人惊讶的优秀性能。然而,尽管训练数据量巨大,它们在翻译稀有词汇时仍存在困难,尤其对低资源语言而言。更严重的是,在LLMs上为低资源语言检索相关演示进行上下文学习通常不切实际,这限制了LLMs在翻译中的实际应用——我们应如何缓解这一问题?为此,我们提出了一种新颖方法CoD,通过为输入词汇子集构建多语言词典链为LLMs注入先验知识,从而激发其翻译能力。大量实验表明,在FLORES-200完整开发测试集上,采用CoD增强的ChatGPT在多语言神经机器翻译中取得了高达13倍ChrF++分数的提升(例如英语到西里尔字母塞尔维亚语的翻译从3.08提升至42.63)。我们进一步论证了多语言词典链式结构的重要性,以及CoD在低资源语言翻译中相较于少样本演示的优越性。