This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images. The model is trained on an extensive bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions for both text and image modalities, mixed in Arabic and English. We also propose the first bilingual GPT-4o based medical LMM benchmark named BiMed-MBench. BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks. It outperforms recent state-of-the-art models in medical LLM evaluation benchmarks. Our model also sets a new benchmark in multimodal medical evaluations with over 9% improvement in English and over 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by around 9% in UPHILL factual accuracy evaluations and excels in various medical Visual Question Answering, Report Generation, and Report Summarization tasks. The project page including source code and the trained model, is available at https://github.com/mbzuai-oryx/BiMediX2.
翻译:本文介绍了BiMediX2,一个双语(阿拉伯语-英语)生物医学专家大语言模型,采用统一架构整合文本与视觉模态,实现了先进的图像理解与医学应用。BiMediX2基于Llama3.1架构,融合文本与视觉能力,支持英语和阿拉伯语的无缝交互,可处理文本输入及包含医学图像的多轮对话。该模型在包含160万双语(阿拉伯语/英语)医疗交互样本的大规模数据集上训练,涵盖文本与图像多种模态。我们同时提出了首个基于GPT-4o的双语医学大语言模型评测基准BiMed-MBench。BiMediX2在文本与图像任务上的评测表现均达到先进水平,在多项医学基准测试中取得最优性能:在医学大语言模型评测基准中超越近期最优模型;在多模态医学评估中刷新纪录,英语评估提升超9%,阿拉伯语评估提升超20%;在UPHILL事实准确性评估中超越GPT-4约9%;在医学视觉问答、报告生成与报告摘要等任务中表现优异。项目页面(含源代码与训练模型)详见:https://github.com/mbzuai-oryx/BiMediX2。