迈向全球医学大语言模型 (Toward Global Large Language Models in Medicine)

Rui Yang,Huitao Li,Weihao Xuan,Heli Qi,Xin Li,Kunyu Yu,Yingjian Chen,Rongrong Wang,Jacques Behmoaras,Tianxi Cai,Bibhas Chakraborty,Qingyu Chen,Lionel Tim-Ee Cheng,Marie-Louise Damwanza,Chido Dzinotyiwei,Aosong Feng,Chuan Hong,Yusuke Iwasawa,Yuhe Ke,Linah Kitala,Taehoon Ko,Jisan Lee,Irene Li,Jonathan Chong Kai Liew,Hongfang Liu,Lian Leng Low,Edison Marrese-Taylor,Yutaka Matsuo,Isheanesu Misi,Yilin Ning,Jasmine Chiat Ling Ong,Marcus Eng Hock Ong,Enrico Petretto,Hossein Rouhizadeh,Abiram Sandralegar,Oren Schreier,Iain Bee Huat Tan,Patrick Tan,Daniel Shu Wei Ting,Junjue Wang,Chunhua Weng,Matthew Yu Heng Wong,Fang Wu,Yunze Xiao,Xuhai Xu,Qingcheng Zeng,Zhuo Zheng,Yifan Peng,Douglas Teodoro,Nan Liu

from arxiv, 182 pages, 65 figures

Despite continuous advances in medical technology, the global distribution of health care resources remains uneven. The development of large language models (LLMs) has transformed the landscape of medicine and holds promise for improving health care quality and expanding access to medical information globally. However, existing LLMs are primarily trained on high-resource languages, limiting their applicability in global medical scenarios. To address this gap, we constructed GlobMed, a large multilingual medical dataset, containing over 500,000 entries spanning 12 languages, including four low-resource languages. Building on this, we established GlobMed-Bench, which systematically assesses 56 state-of-the-art proprietary and open-weight LLMs across multiple multilingual medical tasks, revealing significant performance disparities across languages, particularly for low-resource languages. Additionally, we introduced GlobMed-LLMs, a suite of multilingual medical LLMs trained on GlobMed, with parameters ranging from 1.7B to 8B. GlobMed-LLMs achieved an average performance improvement of over 40% relative to baseline models, with a more than threefold increase in performance on low-resource languages. Together, these resources provide an important foundation for advancing the equitable development and application of LLMs globally, enabling broader language communities to benefit from technological advances.

翻译：尽管医疗技术持续进步，全球医疗资源的分布仍不均衡。大语言模型（LLMs）的发展改变了医学领域的格局，有望提升全球医疗质量并扩大医学信息的可及性。然而，现有LLMs主要基于高资源语言进行训练，限制了其在全球医疗场景中的适用性。为弥补这一差距，我们构建了GlobMed——一个大型多语言医学数据集，包含超过50万条条目，涵盖12种语言，其中包括四种低资源语言。在此基础上，我们建立了GlobMed-Bench，系统评估了56个最先进的专有及开放权重LLMs在多项多语言医学任务上的表现，揭示了不同语言间（尤其是低资源语言）显著的性能差异。此外，我们推出了GlobMed-LLMs——一套基于GlobMed训练的多语言医学大语言模型，参数量从17亿到80亿不等。相较于基线模型，GlobMed-LLMs实现了平均超过40%的性能提升，在低资源语言上的性能增幅更超过三倍。这些资源共同为推进LLMs在全球范围内的公平发展与应⽤提供了重要基础，使更广泛的语言社群能够受益于技术进步。