Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.
翻译:长推理模型在多语言环境下常面临困难:它们倾向于用英语推理非英语问题;当被限制使用问题语言进行推理时,准确率会大幅下降。这种困境源于多语言问题理解与多语言推理能力的双重局限。为同时解决这两个问题,我们提出TRIT(翻译-推理集成训练)——一种将翻译训练融入多语言推理的自我改进框架。该方法无需外部反馈或额外多语言数据,即可协同增强多语言问题理解与响应生成能力。在MMATH数据集上,我们的方法以平均7个百分点的优势超越多个基线模型,同时提升了答案正确性与语言一致性。进一步分析表明,集成翻译训练使跨语言问题对齐度提升超过10个百分点,并显著改善了数学问题及通用领域文本的翻译质量——在FLORES-200数据集上获得最高8.4个COMET分数的提升。