Interstitial lung diseases (ILD) present diagnostic challenges due to their varied manifestations and overlapping imaging features. To address this, we propose a machine learning approach that utilizes CLIP, a multimodal (image and text) self-supervised model, for ILD classification. We extensively integrate zero-shot CLIP throughout our workflow, starting from the initial extraction of image patches from volumetric CT scans and proceeding to ILD classification using "patch montages". Furthermore, we investigate how domain adaptive pretraining (DAPT) CLIP with task-specific images (CT "patch montages" extracted with ILD-specific prompts for CLIP) and/or text (lung-specific sections of radiology reports) affects downstream ILD classification performance. By leveraging CLIP-extracted "patch montages" and DAPT, we achieve strong zero-shot ILD classification results, including an AUROC of 0.893, without the need for any labeled training data. This work highlights the versatility and potential of multimodal models like CLIP for medical image classification tasks where labeled data is scarce.
翻译:间质性肺病(ILD)因其多样的表现和重叠的影像学特征,给诊断带来挑战。为此,我们提出一种利用多模态(图像与文本)自监督模型CLIP进行ILD分类的机器学习方法。我们全程广泛集成零样本CLIP,从从容积CT扫描中初始提取图像区块开始,到使用"图像拼图"进行ILD分类。此外,我们研究了采用任务特定图像(使用ILD特定提示为CLIP提取的CT"图像拼图")和/或文本(放射报告肺部特定部分)进行领域自适应预训练(DAPT)的CLIP对下游ILD分类性能的影响。通过利用CLIP提取的"图像拼图"和DAPT,我们在无需任何标注训练数据的情况下取得了强大的零样本ILD分类结果,包括0.893的AUROC。本工作凸显了CLIP等多模态模型在标注数据稀缺的医学图像分类任务中的通用性与潜力。