Large language models (LLMs) have shown remarkable performance in general translation tasks. However, the increasing demand for high-quality translations that are not only adequate but also fluent and elegant. To assess the extent to which current LLMs can meet these demands, we introduce a suitable benchmark for translating classical Chinese poetry into English. This task requires not only adequacy in translating culturally and historically significant content but also a strict adherence to linguistic fluency and poetic elegance. Our study reveals that existing LLMs fall short of this task. To address these issues, we propose RAT, a \textbf{R}etrieval-\textbf{A}ugmented machine \textbf{T}ranslation method that enhances the translation process by incorporating knowledge related to classical poetry. Additionally, we propose an automatic evaluation metric based on GPT-4, which better assesses translation quality in terms of adequacy, fluency, and elegance, overcoming the limitations of traditional metrics. Our dataset and code will be made available.
翻译:大语言模型在通用翻译任务中展现出卓越性能。然而,当前对高质量翻译的需求日益增长,要求译文不仅准确,还需兼具流畅性与典雅性。为评估现有大语言模型满足这些需求的程度,我们构建了一个适用于古典汉诗英译的基准测试。该任务不仅要求准确传递具有文化历史内涵的内容,还需严格遵循语言流畅性与诗歌典雅性。研究表明,现有大语言模型在此任务上表现欠佳。为此,我们提出RAT方法——一种基于检索增强的机器翻译方法,通过整合古典诗歌相关知识来优化翻译过程。此外,我们提出基于GPT-4的自动评估指标,该指标能更有效地从准确性、流畅性与典雅性三个维度评估翻译质量,克服了传统评估指标的局限性。我们的数据集与代码将公开提供。