Advancing Multimodal Medical Capabilities of Gemini

Lin Yang,Shawn Xu,Andrew Sellergren,Timo Kohlberger,Yuchen Zhou,Ira Ktena,Atilla Kiraly,Faruk Ahmed,Farhad Hormozdiari,Tiam Jaroensri,Eric Wang,Ellery Wulczyn,Fayaz Jamil,Theo Guidroz,Chuck Lau,Siyuan Qiao,Yun Liu,Akshay Goel,Kendall Park,Arnav Agharwal,Nick George,Yang Wang,Ryutaro Tanno,David G. T. Barrett,Wei-Hung Weng,S. Sara Mahdavi,Khaled Saab,Tao Tu,Sreenivasa Raju Kalidindi,Mozziyar Etemadi,Jorge Cuadros,Gregory Sorensen,Yossi Matias,Katherine Chou,Greg Corrado,Joelle Barral,Shravya Shetty,David Fleet,S. M. Ali Eslami,Daniel Tse,Shruthi Prabhakara,Cory McLean,Dave Steiner,Rory Pilgrim,Christopher Kelly,Shekoofeh Azizi,Daniel Golden

Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.

翻译：许多临床任务需要理解专业数据，例如医学影像和基因组学，而这些通常不在通用大型多模态模型的能力范围内。基于Gemini的多模态模型，我们在新的Med-Gemini系列中开发了多个模型，这些模型继承了Gemini的核心能力，并通过使用2D和3D放射学、组织病理学、眼科学、皮肤科学和基因组学数据的微调，针对医疗应用进行了优化。Med-Gemini-2D在基于人工智能的胸部X光（CXR）报告生成方面树立了新标准，根据专家评估，在两个独立数据集上分别以1%和12%的绝对优势超越了此前的最佳结果，其中57%和96%的正常病例AI报告，以及43%和65%的异常病例AI报告，被评估为“等同于或优于”放射科医生的原始报告。我们首次展示了基于大型多模态模型的3D计算机断层扫描（CT）体积报告生成，使用了Med-Gemini-3D，其中53%的AI报告被认为临床可接受，尽管仍需进一步研究以达到放射科专家报告的质量标准。在报告生成之外，Med-Gemini-2D在CXR视觉问答（VQA）任务上超越了此前最佳性能，并在CXR分类和放射学VQA任务中表现良好，在20项任务中的17项上超过了当前最优水平或基线。在组织病理学、眼科学和皮肤科学图像分类中，Med-Gemini-2D在20项任务中的18项上超越了基线，并接近了任务特定模型的性能。在影像之外，Med-Gemini-Polygenic在疾病风险预测方面的表现优于基于标准线性多基因风险评分的方法，并且能够泛化到从未训练过但具有遗传相关性的疾病。尽管在安全关键的医疗领域仍需进一步开发和评估，我们的结果突显了Med-Gemini在广泛医疗任务中的潜力。