Despite recent breakthroughs in reasoning-enhanced large language models (LLMs) like DeepSeek-R1, incorporating inference-time reasoning into machine translation (MT), where human translators naturally employ structured, multi-layered reasoning chain-of-thoughts (CoTs), is yet underexplored. Existing methods either design a fixed CoT tailored for a specific MT sub-task (e.g., literature translation), or rely on synthesizing CoTs unaligned with humans, limiting their adaptability to diverse translation scenarios. This paper introduces R1-Translator (R1-T1), a novel framework to achieve inference-time reasoning for general MT via reinforcement learning (RL) with human-aligned CoTs comprising six common patterns. Our approach pioneers three innovations: (1) extending reasoning-based translation beyond MT sub-tasks to six languages and diverse tasks (e.g., legal/medical domain adaptation, idiom resolution); (2) formalizing six expert-curated CoT templates that mirror hybrid human strategies like context-aware paraphrasing and back translation; and (3) enabling self-evolving CoT discovery through RL. Experimental results indicate a steady translation performance improvement in 11 languages and 40 translation directions on Flores-101 test set, especially on the languages unseen from training.
翻译:尽管近期在推理增强的大型语言模型(LLMs)如DeepSeek-R1方面取得了突破,但将推理时推理融入机器翻译(MT)——人类译者在此过程中自然采用结构化、多层次的思维链(CoTs)——仍属未充分探索的领域。现有方法要么为特定MT子任务(例如文学翻译)设计固定的思维链,要么依赖于合成与人类思维不对齐的思维链,这限制了它们对不同翻译场景的适应性。本文介绍了R1-Translator(R1-T1),这是一个新颖的框架,旨在通过强化学习(RL)结合包含六种常见模式且与人类思维对齐的思维链,实现通用机器翻译的推理时推理。我们的方法开创了三个创新点:(1)将基于推理的翻译从MT子任务扩展到六种语言及多样化任务(例如法律/医学领域适应、习语解析);(2)形式化了六种由专家策划的思维链模板,这些模板反映了混合的人类策略,如上下文感知的释义和回译;以及(3)通过强化学习实现自我演化的思维链发现。实验结果表明,在Flores-101测试集的11种语言和40个翻译方向上,翻译性能均获得稳定提升,尤其是在训练中未见过的语言上表现更为突出。