Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types. PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles. Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP achieved new state-of-the-art results in a wide range of standard datasets, substantially outperforming prior approaches. Intriguingly, by large-scale pretraining on diverse biomedical image types, BiomedCLIP even outperforms state-of-the-art radiology-specific models such as BioViL in radiology-specific tasks such as RSNA pneumonia detection. In summary, BiomedCLIP is a fully open-access foundation model that achieves state-of-the-art performance on various biomedical tasks, paving the way for transformative multimodal biomedical discovery and applications. We release our models at https://aka.ms/biomedclip to facilitate future research in multimodal biomedical AI.
翻译:生物医学数据本质上具有多模态特性,包含物理测量数据和自然语言描述。通用型生物医学AI模型需要同时处理文本与图像等不同模态的数据。因此,训练有效的通用型生物医学模型需要高质量的多模态数据,例如平行图像-文本对。本文提出PMC-15M数据集,其规模比现有生物医学多模态数据集(如MIMIC-CXR)高两个数量级,且涵盖多样化的生物医学图像类型。PMC-15M包含从440万篇科学论文中收集的1500万对生物医学图像-文本对。基于PMC-15M,我们预训练了BiomedCLIP多模态基础模型,并针对生物医学视觉-语言处理进行了领域自适应优化。我们在从检索、分类到视觉问答(VQA)的标准生物医学成像任务上开展了大量实验和消融研究。BiomedCLIP在广泛的标准数据集上取得了新的最先进结果,显著优于现有方法。值得注意的是,通过对多种生物医学图像类型的大规模预训练,BiomedCLIP在放射学特定任务(如RSNA肺炎检测)中甚至超越了BioViL等最先进的放射学专用模型。综上所述,BiomedCLIP是一个完全开放获取的基础模型,在各类生物医学任务中均达到最先进性能,为变革性的多模态生物医学发现与应用铺平了道路。我们已在https://aka.ms/biomedclip发布模型,以促进多模态生物医学AI的未来研究。