Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and DINOv2 in Medical Imaging Classification

Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which uses the vision transformer architecture, has opened new opportunities in the field and gathered significant interest. However, DINOv2's performance on clinical data still needs to be verified. In this paper, we performed a glioma grading task using three clinical modalities of brain MRI data. We compared the performance of various pre-trained deep learning models, including those based on ImageNet and DINOv2, in a transfer learning context. Our focus was on understanding the impact of the freezing mechanism on performance. We also validated our findings on three other types of public datasets: chest radiography, fundus radiography, and dermoscopy. Our findings indicate that in our clinical dataset, DINOv2's performance was not as strong as ImageNet-based pre-trained models, whereas in public datasets, DINOv2 generally outperformed other models, especially when using the frozen mechanism. Similar performance was observed with various sizes of DINOv2 models across different tasks. In summary, DINOv2 is viable for medical image classification tasks, particularly with data resembling natural images. However, its effectiveness may vary with data that significantly differs from natural images such as MRI. In addition, employing smaller versions of the model can be adequate for medical task, offering resource-saving benefits. Our codes are available at https://github.com/GuanghuiFU/medical_DINOv2_eval.

翻译：医学图像分析常面临数据稀缺的挑战。迁移学习在解决该问题同时节省计算资源方面已表现出有效性。近期，采用视觉Transformer架构的基础模型（如DINOv2）的出现为该领域开辟了新机遇并引起了广泛关注。然而，DINOv2在临床数据上的性能仍需验证。本文利用三种临床模态的脑部MRI数据执行胶质瘤分级任务，在迁移学习框架下比较了多种预训练深度学习模型（包括基于ImageNet和DINOv2的模型）的性能，重点探究冻结机制对性能的影响。我们还在胸部放射影像、眼底放射影像和皮肤镜图像三类公开数据集上验证了研究结果。实验表明：在临床数据集中，DINOv2的性能弱于基于ImageNet的预训练模型；而在公开数据集中，DINOv2普遍优于其他模型（尤其在采用冻结机制时）。不同规模的DINOv2模型在各类任务中表现相似。综上，DINOv2适用于医学影像分类任务，尤其是处理类自然图像的数据，但其在MRI等与自然图像差异显著的数据上的有效性会有所变化。此外，采用小规模模型即可满足医学任务需求，具有节省资源的优势。我们的代码开源于https://github.com/GuanghuiFU/medical_DINOv2_eval。