To investigate the impact of OOD radiographs on existing chest X-ray classification models and to increase their robustness against OOD data. The study employed the commonly used chest X-ray classification model, CheXnet, trained on the chest X-ray 14 data set, and tested its robustness against OOD data using three public radiography data sets: IRMA, Bone Age, and MURA, and the ImageNet data set. To detect OOD data for multi-label classification, we proposed in-distribution voting (IDV). The OOD detection performance is measured across data sets using the area under the receiver operating characteristic curve (AUC) analysis and compared with Mahalanobis-based OOD detection, MaxLogit, MaxEnergy and self-supervised OOD detection (SS OOD). Without additional OOD detection, the chest X-ray classifier failed to discard any OOD images, with an AUC of 0.5. The proposed IDV approach trained on ID (chest X-ray 14) and OOD data (IRMA and ImageNet) achieved, on average, 0.999 OOD AUC across the three data sets, surpassing all other OOD detection methods. Mahalanobis-based OOD detection achieved an average OOD detection AUC of 0.982. IDV trained solely with a few thousand ImageNet images had an AUC 0.913, which was higher than MaxLogit (0.726), MaxEnergy (0.724), and SS OOD (0.476). The performance of all tested OOD detection methods did not translate well to radiography data sets, except Mahalanobis-based OOD detection and the proposed IDV method. Training solely on ID data led to incorrect classification of OOD images as ID, resulting in increased false positive rates. IDV substantially improved the model's ID classification performance, even when trained with data that will not occur in the intended use case or test set, without additional inference overhead.
翻译:为探究分布外(OOD)X光影像对现有胸部X光分类模型的影响,并增强其对OOD数据的鲁棒性,本研究采用广泛使用的胸部X光分类模型CheXnet(基于胸部X光14数据集训练),通过三个公开放射影像数据集(IRMA、骨龄、MURA)及ImageNet数据集测试其对OOD数据的鲁棒性。针对多标签分类中的OOD数据检测,我们提出分布内投票(IDV)方法。采用受试者工作特征曲线下面积(AUC)分析跨数据集评估OOD检测性能,并与基于马氏距离的OOD检测、MaxLogit、MaxEnergy及自监督OOD检测(SS OOD)进行对比。未配置额外OOD检测时,胸部X光分类器无法丢弃任何OOD图像,其AUC值为0.5。所提出的IDV方法基于分布内数据(胸部X光14)与OOD数据(IRMA和ImageNet)训练,在三个数据集上平均OOD检测AUC达0.999,超越所有其他OOD检测方法。基于马氏距离的OOD检测平均AUC为0.982。仅使用数千张ImageNet图像训练的IDV方法AUC达0.913,高于MaxLogit(0.726)、MaxEnergy(0.724)及SS OOD(0.476)。除基于马氏距离的OOD检测与所提出的IDV方法外,所有测试的OOD检测方法在放射影像数据集上的性能均未实现有效迁移。仅使用分布内数据训练会导致OOD图像被错误分类为分布内数据,从而增加假阳性率。即使采用预期用例或测试集中不会出现的数据进行训练,IDV仍能显著提升模型的分布内分类性能,且无需额外推理开销。