The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for general LLMs; Second, to monitor the development of multilingual medical LLMs, we propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; Third, we have assessed a number of open-source large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC. Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks, even rivaling GPT-4. In conclusion, in this work, we present a large-scale corpus, a benchmark and a series of models to support the development of multilingual medical LLMs.
翻译:开源多语言医学语言模型的发展,能够惠及来自不同地区、语言背景广泛的受众。为推进该领域发展,我们作出如下贡献:首先,我们构建了一个多语言医学语料库,包含约255亿个词元,涵盖6种主要语言,称为MMedC,可用于对通用大语言模型进行自回归领域适应;其次,为监测多语言医学大语言模型的发展,我们提出了一个附带推理过程的多语言医学多项选择题问答基准,称为MMedBench;第三,我们在该基准上评估了多个开源大语言模型,以及那些在MMedC上进一步进行自回归训练的模型。我们的最终模型MMed-Llama 3,仅拥有80亿参数,在MMedBench和英文基准测试中均取得了优于所有其他开源模型的性能,甚至可与GPT-4相媲美。总之,本工作提出了一个大规模语料库、一个基准测试以及一系列模型,以支持多语言医学大语言模型的发展。