While reasoning-enhanced large language models perform strongly on English medical tasks, a persistent multilingual gap remains, with substantially weaker reasoning in local languages, limiting equitable global medical deployment. To bridge this gap, we introduce Med-CoReasoner, a language-informed co-reasoning framework that elicits parallel English and local-language reasoning, abstracts them into structured concepts, and integrates local clinical knowledge into an English logical scaffold via concept-level alignment and retrieval. This design combines the structural robustness of English reasoning with the practice-grounded expertise encoded in local languages. To evaluate multilingual medical reasoning beyond multiple-choice settings, we construct MultiMed-X, a benchmark covering seven languages with expert-annotated long-form question answering and natural language inference tasks, comprising 350 instances per language. Experiments across three benchmarks show that Med-CoReasoner improves multilingual reasoning performance by an average of 5%, with particularly substantial gains in low-resource languages. Moreover, model distillation and expert evaluation analysis further confirm that Med-CoReasoner produces clinically sound and culturally grounded reasoning traces.
翻译:尽管推理增强的大语言模型在英语医学任务上表现强劲,但多语言差距依然存在,本地语言的推理能力显著较弱,这限制了全球医疗公平部署。为弥合这一差距,我们提出了Med-CoReasoner,一种语言协同推理框架。该框架能够并行激发英语与本地语言的推理过程,将其抽象为结构化概念,并通过概念层对齐与检索,将本地临床知识整合到英语逻辑框架中。这一设计结合了英语推理的结构鲁棒性与本地语言所蕴含的实践专业知识。为了在多项选择之外评估多语言医学推理能力,我们构建了MultiMed-X基准测试,涵盖七种语言,包含专家标注的长篇问答与自然语言推理任务,每种语言包含350个实例。在三个基准测试上的实验表明,Med-CoReasoner将多语言推理性能平均提升了5%,在低资源语言中提升尤为显著。此外,模型蒸馏与专家评估分析进一步证实,Med-CoReasoner能够生成临床合理且基于文化背景的推理轨迹。