Due to the large volume of medical imaging data, advanced AI methodologies are needed to assist radiologists in diagnosing thoracic diseases from chest X-rays (CXRs). Existing deep learning models often require large, labeled datasets, which are scarce in medical imaging due to the time-consuming and expert-driven annotation process. In this paper, we extend the existing approach to enhance zero-shot learning in medical imaging by integrating Contrastive Language-Image Pre-training (CLIP) with Momentum Contrast (MoCo), resulting in our proposed model, MoCoCLIP. Our method addresses challenges posed by class-imbalanced and unlabeled datasets, enabling improved detection of pulmonary pathologies. Experimental results on the NIH ChestXray14 dataset demonstrate that MoCoCLIP outperforms the state-of-the-art CheXZero model, achieving relative improvement of approximately 6.5%. Furthermore, on the CheXpert dataset, MoCoCLIP demonstrates superior zero-shot performance, achieving an average AUC of 0.750 compared to CheXZero with 0.746 AUC, highlighting its enhanced generalization capabilities on unseen data.
翻译:鉴于医学影像数据量庞大,需要先进的人工智能方法来辅助放射科医生从胸部X射线(CXR)中诊断胸部疾病。现有的深度学习模型通常需要大量标注数据集,而由于标注过程耗时且依赖专家,医学影像领域此类数据十分稀缺。本文通过将对比语言-图像预训练(CLIP)与动量对比(MoCo)相结合,扩展了现有方法以增强医学影像中的零样本学习能力,由此提出我们的模型MoCoCLIP。该方法解决了类别不平衡和未标注数据集带来的挑战,从而提升了对肺部病理的检测能力。在NIH ChestXray14数据集上的实验结果表明,MoCoCLIP优于当前最先进的CheXZero模型,实现了约6.5%的相对性能提升。此外,在CheXpert数据集上,MoCoCLIP展现出更优异的零样本性能,平均AUC达到0.750,而CheXZero为0.746,这凸显了其在未见数据上更强的泛化能力。