Literary translation remains one of the most challenging frontiers in machine translation due to the complexity of capturing figurative language, cultural nuances, and unique stylistic elements. In this work, we introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company, including a CEO, Senior Editor, Junior Editor, Translator, Localization Specialist, and Proofreader. The translation process is divided into two stages: a preparation stage where the team is assembled and comprehensive translation guidelines are drafted, and an execution stage that involves sequential translation, localization, proofreading, and a final quality check. Furthermore, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP), which evaluates translations based solely on target language quality and cultural appropriateness, and Bilingual LLM Preference (BLP), which leverages large language models like GPT-4} for direct text comparison. Although TransAgents achieves lower d-BLEU scores, due to the limited diversity of references, its translations are significantly better than those of other baselines and are preferred by both human evaluators and LLMs over traditional human references and GPT-4} translations. Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.
翻译:文学翻译因其在捕捉比喻性语言、文化细微差别和独特文体元素方面的复杂性,仍然是机器翻译最具挑战性的领域之一。在本研究中,我们引入了TransAgents,一种新颖的多智能体框架,它模拟了人类翻译公司的角色与协作实践,包括首席执行官、高级编辑、初级编辑、翻译员、本地化专家和校对员。翻译过程分为两个阶段:准备阶段,即组建团队并起草全面的翻译指南;以及执行阶段,该阶段涉及顺序性的翻译、本地化、校对和最终质量检查。此外,我们提出了两种创新的评估策略:单语人类偏好(MHP),它仅基于目标语言质量和文化适宜性来评估翻译;以及双语大语言模型偏好(BLP),它利用如GPT-4等大语言模型进行直接的文本比较。尽管由于参考译文的多样性有限,TransAgents获得了较低的d-BLEU分数,但其翻译质量显著优于其他基线模型,并且在人类评估者和LLM的偏好中,均优于传统的人类参考译文和GPT-4的翻译。我们的研究结果突显了多智能体协作在提升翻译质量,尤其是长文本翻译质量方面的潜力。