LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Duy M. H. Nguyen,Hoang Nguyen,Nghiem T. Diep,Tan N. Pham,Tri Cao,Binh T. Nguyen,Paul Swoboda,Nhat Ho,Shadi Albarqouni,Pengtao Xie,Daniel Sonntag,Mathias Niepert

from arxiv, Update Appendix

Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.

翻译：获取能够通过有限标注样本微调至新任务的大型预训练模型，仍是医学影像数据领域面临的开放挑战。尽管基于ImageNet预训练的深度网络和基于网络规模数据训练的视觉语言基础模型是主流方法，但由于自然图像与医学图像之间存在显著域偏移，这些方法在医学任务中的有效性受到限制。为弥合这一差距，我们提出LVM-Med——首个在大规模医学数据集上训练的深度网络系列。我们从55个公开数据集中收集约130万张医学图像，涵盖CT、MRI、X射线和超声等多种器官与模态。我们在该数据集上基准测试了多项先进的自监督算法，并提出一种基于图匹配公式的新型自监督对比学习算法。该方法有三项贡献：(i) 整合基于局部与全局信息的先验成对图像相似度度量；(ii) 通过基于组合图匹配目标的损失函数捕获特征嵌入的结构约束；(iii) 利用黑盒求解器的现代梯度估计技术实现高效端到端训练。我们在15项下游医学任务（涵盖分割、分类及目标检测）上全面评估LVM-Med，包括分布内与分布外场景。实验表明，LVM-Med在性能上优于多项先进的有监督、自监督及基础模型。在如脑肿瘤分类或糖尿病视网膜病变分级等具有挑战性的任务中，LVM-Med在仅使用ResNet-50的情况下，相较于基于10亿掩码训练的视觉语言模型，性能提升6-7%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日