Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are models pre-trained on large datasets, have emerged as a solution to reduce reliance on annotated data and enhance model generalizability and robustness. DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images that exhibits promising capabilities across various vision tasks. Nevertheless, a critical question remains unanswered regarding DINOv2's adaptability to radiological imaging, and whether its features are sufficiently general to benefit radiology image analysis. Therefore, this study comprehensively evaluates DINOv2 for radiology, conducting over 100 experiments across diverse modalities (X-ray, CT, and MRI). To measure the effectiveness and generalizability of DINOv2's feature representations, we analyze the model across medical image analysis tasks including disease classification and organ segmentation on both 2D and 3D images, and under different settings like kNN, few-shot learning, linear-probing, end-to-end fine-tuning, and parameter-efficient fine-tuning. Comparative analyses with established supervised, self-supervised, and weakly-supervised models reveal DINOv2's superior performance and cross-task generalizability. The findings contribute insights to potential avenues for optimizing pre-training strategies for medical imaging and enhancing the broader understanding of DINOv2's role in bridging the gap between natural and radiological image analysis. Our code is available at https://github.com/MohammedSB/DINOv2ForRadiology

翻译：深度学习系统在医疗领域的整合受到数据标注资源密集型过程以及系统无法泛化到不同数据分布的限制。基础模型，即在大型数据集上预训练的模型，已成为减少对标注数据依赖并提升模型泛化能力和鲁棒性的解决方案。DINOv2是一个开源基础模型，通过自监督学习在1.42亿张精选自然图像上预训练，展现出在各种视觉任务中的潜力。然而，关于DINOv2对放射影像的适应性及其特征是否具有足够通用性以惠及放射学图像分析的关键问题仍未解答。因此，本研究全面评估了DINOv2在放射学中的应用，跨不同模态（X射线、CT和MRI）进行了100多项实验。为衡量DINOv2特征表示的有效性和泛化能力，我们分析了该模型在包括疾病分类和器官分割的医学图像分析任务中，涉及2D和3D图像，并在不同设置下（如kNN、少样本学习、线性探测、端到端微调和参数高效微调）的表现。与已建立的有监督、自监督和弱监督模型的比较分析揭示了DINOv2的卓越性能和跨任务泛化能力。研究结果为指导医学影像预训练策略优化及更广泛理解DINOv2在弥合自然与放射学图像分析之间差距的作用提供了见解。我们的代码可在 https://github.com/MohammedSB/DINOv2ForRadiology 获取。