Large language models (LLMs) have shown remarkable performance in general translation tasks. However, the increasing demand for high-quality translations that are not only adequate but also fluent and elegant. To assess the extent to which current LLMs can meet these demands, we introduce a suitable benchmark for translating classical Chinese poetry into English. This task requires not only adequacy in translating culturally and historically significant content but also a strict adherence to linguistic fluency and poetic elegance. Our study reveals that existing LLMs fall short of this task. To address these issues, we propose RAT, a \textbf{R}etrieval-\textbf{A}ugmented machine \textbf{T}ranslation method that enhances the translation process by incorporating knowledge related to classical poetry. Additionally, we propose an automatic evaluation metric based on GPT-4, which better assesses translation quality in terms of adequacy, fluency, and elegance, overcoming the limitations of traditional metrics. Our dataset and code will be made available.
翻译:大语言模型在通用翻译任务中已展现出卓越性能。然而,对翻译质量的要求日益提高,不仅要求准确,还需兼具流畅与典雅。为评估当前大语言模型满足这些需求的程度,我们提出了一个适用于将古典汉诗翻译成英文的基准测试。该任务不仅要求准确传达具有文化历史内涵的内容,还需严格遵循语言流畅性与诗歌典雅性。我们的研究表明,现有大语言模型在此任务上表现不足。为解决这些问题,我们提出RAT方法——一种**检索增强**的机器**翻译**方法,通过整合古典诗歌相关知识来优化翻译过程。此外,我们提出基于GPT-4的自动评估指标,能更好地从准确性、流畅性和典雅性三个维度评估翻译质量,克服了传统指标的局限性。我们的数据集与代码将公开提供。