Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, real-time adaptation remains challenging. Large-scale language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. By feeding an LLM with a prompt that consists of a list of translation pairs, it can then simulate the domain and style characteristics at inference time. This work aims to investigate how we can utilize in-context learning to improve real-time adaptive MT. Our extensive experiments show promising results at translation time. For example, GPT-3.5 can adapt to a set of in-domain sentence pairs and/or terminology while translating a new sentence. We observe that the translation quality with few-shot in-context learning can surpass that of strong encoder-decoder MT systems, especially for high-resource languages. Moreover, we investigate whether we can combine MT from strong encoder-decoder models with fuzzy matches, which can further improve the translation, especially for less supported languages. We conduct our experiments across five diverse languages, namely English-to-Arabic (EN-AR), English-to-Chinese (EN-ZH), English-to-French (EN-FR), English-to-Kinyarwanda (EN-RW), and English-to-Spanish (EN-ES) language pairs.
翻译:一致性是高质量翻译的关键要求,在特定领域的项目中遵循预先批准的术语和修正后的翻译尤为重要。机器翻译(MT)在领域自适应方面取得了显著进展,然而实时自适应仍然具有挑战性。近年来,大规模语言模型(LLMs)展示了有趣的上下文学习能力,即无需进一步微调即可学习复制特定的输入输出文本生成模式。通过向LLM输入由翻译对列表组成的提示,它可以在推理时模拟领域和风格特征。本研究旨在探究如何利用上下文学习改进实时自适应机器翻译。我们的广泛实验在翻译时展示了有希望的结果。例如,GPT-3.5在翻译新句子时可以适应一组领域内句子对和/或术语。我们观察到,基于少样本上下文学习的翻译质量可以超越强大的编码器-解码器机器翻译系统,尤其对于高资源语言而言。此外,我们还研究了能否将基于强大编码器-解码器模型的机器翻译与模糊匹配结合,这可以进一步改进翻译效果,尤其对于低资源语言。我们在五种不同语系上进行实验,即英语到阿拉伯语(EN-AR)、英语到中文(EN-ZH)、英语到法语(EN-FR)、英语到卢旺达语(EN-RW)以及英语到西班牙语(EN-ES)的语言对。