Classification of heterogeneous diseases is challenging due to their complexity, variability of symptoms and imaging findings. Chronic Obstructive Pulmonary Disease (COPD) is a prime example, being underdiagnosed despite being the third leading cause of death. Its sparse, diffuse and heterogeneous appearance on computed tomography challenges supervised binary classification. We reformulate COPD binary classification as an anomaly detection task, proposing cOOpD: heterogeneous pathological regions are detected as Out-of-Distribution (OOD) from normal homogeneous lung regions. To this end, we learn representations of unlabeled lung regions employing a self-supervised contrastive pretext model, potentially capturing specific characteristics of diseased and healthy unlabeled regions. A generative model then learns the distribution of healthy representations and identifies abnormalities (stemming from COPD) as deviations. Patient-level scores are obtained by aggregating region OOD scores. We show that cOOpD achieves the best performance on two public datasets, with an increase of 8.2% and 7.7% in terms of AUROC compared to the previous supervised state-of-the-art. Additionally, cOOpD yields well-interpretable spatial anomaly maps and patient-level scores which we show to be of additional value in identifying individuals in the early stage of progression. Experiments in artificially designed real-world prevalence settings further support that anomaly detection is a powerful way of tackling COPD classification.
翻译:异质性疾病的分类因其复杂性、症状和影像学发现的变异性而极具挑战。慢性阻塞性肺疾病(COPD)便是一个典型例子,尽管其已成为第三大死因,但诊断率长期不足。该疾病在计算机断层扫描中呈现稀疏、弥漫且异质的表现,给有监督二分类任务带来困难。本文将COPD二分类问题重构为异常检测任务,提出cOOpD方法:将异质性病理区域检测为正常均质肺区域的分布外(OOD)样本。为此,我们采用自监督对比预训练模型学习未标注肺区域的表征,从而潜在捕捉病变与健康未标注区域的特定特征。随后通过生成模型学习健康表征的分布,并将(源自COPD的)异常识别为分布偏差。通过聚合区域OOD评分获得患者级评分。实验表明,cOOpD在两个公开数据集上均取得最优性能,AUROC分别较此前最佳有监督方法提升8.2%和7.7%。此外,cOOpD生成的可解释空间异常图谱及患者级评分在识别疾病早期进展个体方面展现出额外价值。在人工设计的真实患病率场景下开展的实验进一步证实,异常检测是解决COPD分类问题的有效途径。