Traditional neural machine translation (NMT) systems often fail to translate sentences that contain culturally specific information. Most previous NMT methods have incorporated external cultural knowledge during training, which requires fine-tuning on low-frequency items specific to the culture. Recent in-context learning utilizes lightweight prompts to guide large language models (LLMs) to perform machine translation, however, whether such an approach works in terms of injecting culture awareness into machine translation remains unclear. To this end, we introduce a new data curation pipeline to construct a culturally relevant parallel corpus, enriched with annotations of cultural-specific entities. Additionally, we design simple but effective prompting strategies to assist this LLM-based translation. Extensive experiments show that our approaches can largely help incorporate cultural knowledge into LLM-based machine translation, outperforming traditional NMT systems in translating cultural-specific sentences.
翻译:传统神经机器翻译(NMT)系统在处理包含文化特定信息的句子时往往失效。以往大部分NMT方法通过训练阶段引入外部文化知识,但需要对这些文化低频词项进行微调。近期基于上下文学习的方法利用轻量级提示词引导大型语言模型(LLM)执行机器翻译,然而此类方法能否有效将文化意识注入机器翻译仍不明确。为此,我们提出一种新的数据筛选流水线,构建富含文化特定实体标注的文化关联平行语料库。同时设计简洁有效的提示策略辅助基于LLM的翻译。大量实验表明,我们的方法能显著促进文化知识融入基于LLM的机器翻译,在翻译文化特定语句方面全面超越传统NMT系统。