Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x chrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.
翻译:大语言模型(LLMs)即便在无平行语料训练条件下,在多语种神经机器翻译(MNMT)中仍展现出令人惊讶的良好性能。然而,尽管训练数据规模庞大,它们在翻译罕见词汇(尤其是低资源语言)时仍存在困难。更严峻的是,在LLMs上为低资源语言检索相关示例进行上下文学习通常不切实际,这限制了LLMs在翻译中的实际应用——我们应如何缓解这一问题?为此,我们提出了一种新颖方法CoD,通过为输入词汇子集构建多语种词典链,向LLMs注入先验知识以激发其翻译能力。大量实验表明,在FLORES-200完整开发测试集上,采用CoD增强ChatGPT可在MNMT任务中实现高达13倍chrF++得分提升(如塞尔维亚语西里尔字母转英语从3.08提升至42.63)。我们进一步验证了多语种词典链的关键作用,以及CoD在低资源语言上相较于少样本示范的优越性。