Despite achieving remarkable performance, machine translation (MT) research remains underexplored in terms of translating cultural elements in languages, such as idioms, proverbs, and colloquial expressions. This paper investigates the capability of state-of-the-art neural machine translation (NMT) and large language models (LLMs) in translating proverbs, which are deeply rooted in cultural contexts. We construct a translation dataset of standalone proverbs and proverbs in conversation for four language pairs. Our experiments show that the studied models can achieve good translation between languages with similar cultural backgrounds, and LLMs generally outperform NMT models in proverb translation. Furthermore, we find that current automatic evaluation metrics such as BLEU, CHRF++ and COMET are inadequate for reliably assessing the quality of proverb translation, highlighting the need for more culturally aware evaluation metrics.
翻译:尽管机器翻译研究已取得显著进展,但在处理语言中文化元素(如习语、谚语和口语表达)的翻译方面仍探索不足。本文研究了最先进的神经机器翻译模型与大语言模型在翻译深植于文化背景的谚语方面的能力。我们构建了包含独立谚语和对话情境中谚语的翻译数据集,涵盖四组语言对。实验表明,所研究的模型能在文化背景相似的语言间实现良好的谚语翻译,且大语言模型在谚语翻译任务上普遍优于神经机器翻译模型。此外,我们发现当前自动评估指标(如BLEU、CHRF++和COMET)不足以可靠评估谚语翻译质量,这凸显了对更具文化感知能力的评估指标的迫切需求。