推理通用语：多语言人工智能的双刃剑 (The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AI)

Large Reasoning Models (LRMs) achieve strong performance on mathematical, scientific, and other question-answering tasks, but their multilingual reasoning abilities remain underexplored. When presented with non-English questions, LRMs often default to reasoning in English, raising concerns about interpretability and the handling of linguistic and cultural nuances. We systematically compare an LRM's reasoning in English versus the language of the question. Our evaluation spans two tasks: MGSM and GPQA Diamond. Beyond measuring answer accuracy, we also analyze cognitive attributes in the reasoning traces. We find that English reasoning traces exhibit a substantially higher presence of these cognitive behaviors, and that reasoning in English generally yields higher final-answer accuracy, with the performance gap increasing as tasks become more complex. However, this English-centric strategy is susceptible to a key failure mode - getting "Lost in Translation," where translation steps lead to errors that would have been avoided by question's language reasoning.

翻译：大型推理模型（LRMs）在数学、科学及其他问答任务上表现出色，但其多语言推理能力仍未得到充分探索。当面对非英语问题时，LRMs往往默认使用英语进行推理，这引发了关于模型可解释性以及对语言文化细微差别处理能力的担忧。我们系统比较了LRM使用英语与问题原文语言进行推理的表现。评估涵盖两项任务：MGSM与GPQA Diamond。除衡量答案准确率外，我们还分析了推理轨迹中的认知特征。研究发现：英语推理轨迹中这些认知行为的出现频率显著更高，且使用英语推理通常能获得更高的最终答案准确率，随着任务复杂度提升，这种性能差距会进一步扩大。然而，这种以英语为中心的推理策略存在一个关键缺陷——容易陷入“翻译迷途”的失效模式：翻译步骤可能导致本可通过问题原文语言推理避免的错误。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日