In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic. Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details such as patient symptoms and medical history, multiple-choice question answering, and open-ended question answering. We propose a semi-automated English-to-Arabic translation pipeline with human refinement to ensure high-quality translations. We also introduce a comprehensive evaluation benchmark for Arabic medical LLMs. Furthermore, we introduce BiMed1.3M, an extensive Arabic-English bilingual instruction set covering 1.3 Million diverse medical interactions, resulting in over 632 million healthcare specialized tokens for instruction tuning. Our BiMed1.3M dataset includes 250k synthesized multi-turn doctor-patient chats and maintains a 1:2 Arabic-to-English ratio. Our model outperforms state-of-the-art Med42 and Meditron by average absolute gains of 2.5% and 4.1%, respectively, computed across multiple medical evaluation benchmarks in English, while operating at 8-times faster inference. Moreover, our BiMediX outperforms the generic Arabic-English bilingual LLM, Jais-30B, by average absolute gains of 10% on our Arabic medical benchmark and 15% on bilingual evaluations across multiple datasets. Our project page with source code and trained model is available at https://github.com/mbzuai-oryx/BiMediX .
翻译:本文提出 BiMediX,首个专为英语与阿拉伯语无缝交互设计的双语医学混合专家大语言模型。该模型支持英语和阿拉伯语中的多种医学交互场景,包括用于询问患者症状及病史等细节的多轮对话、多项选择题问答以及开放式问答。我们提出了一种半自动化的英语到阿拉伯语翻译流水线,并辅以人工精炼以确保翻译质量。同时,我们构建了针对阿拉伯语医学大语言模型的综合评估基准。进一步地,我们引入 BiMed1.3M 数据集,包含覆盖130万种医学交互场景的阿拉伯语-英语双语指令集,提供超过6.32亿医疗专用token用于指令微调。该数据集中包含25万条合成的多轮医患对话,阿拉伯语与英语比例为1:2。在英语多项医学评估基准上,我们的模型以平均绝对增益2.5%和4.1%分别超越现有最优模型Med42与Meditron,同时推理速度提升8倍。此外,BiMediX 在阿拉伯语医学基准测试中平均绝对增益达10%,在多数据集双语评估中较通用阿拉伯语-英语双语大语言模型Jais-30B提升15%。项目页面(含源代码与训练模型)访问地址为:https://github.com/mbzuai-oryx/BiMediX。