BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Sahal Shaji Mullappilly,Mohammed Irfan Kurpath,Sara Pieri,Saeed Yahya Alseiari,Shanavas Cholakkal,Khaled Aldahmani,Fahad Khan,Rao Anwer,Salman Khan,Timothy Baldwin,Hisham Cholakkal

This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images. The model is trained on an extensive bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions for both text and image modalities, mixed in Arabic and English. We also propose the first bilingual GPT-4o based medical LMM benchmark named BiMed-MBench. BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks. It outperforms recent state-of-the-art models in medical LLM evaluation benchmarks. Our model also sets a new benchmark in multimodal medical evaluations with over 9% improvement in English and over 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by around 9% in UPHILL factual accuracy evaluations and excels in various medical Visual Question Answering, Report Generation, and Report Summarization tasks. The project page including source code and the trained model, is available at https://github.com/mbzuai-oryx/BiMediX2.

翻译：本文介绍了BiMediX2，一个双语（阿拉伯语-英语）生物医学专家大语言模型，采用统一架构整合文本与视觉模态，实现了先进的图像理解与医学应用。BiMediX2基于Llama3.1架构，融合文本与视觉能力，支持英语和阿拉伯语的无缝交互，可处理文本输入及包含医学图像的多轮对话。该模型在包含160万双语（阿拉伯语/英语）医疗交互样本的大规模数据集上训练，涵盖文本与图像多种模态。我们同时提出了首个基于GPT-4o的双语医学大语言模型评测基准BiMed-MBench。BiMediX2在文本与图像任务上的评测表现均达到先进水平，在多项医学基准测试中取得最优性能：在医学大语言模型评测基准中超越近期最优模型；在多模态医学评估中刷新纪录，英语评估提升超9%，阿拉伯语评估提升超20%；在UPHILL事实准确性评估中超越GPT-4约9%；在医学视觉问答、报告生成与报告摘要等任务中表现优异。项目页面（含源代码与训练模型）详见：https://github.com/mbzuai-oryx/BiMediX2。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日