This paper presents a novel positive and negative set selection strategy for contrastive learning of medical images based on labels that can be extracted from clinical data. In the medical field, there exists a variety of labels for data that serve different purposes at different stages of a diagnostic and treatment process. Clinical labels and biomarker labels are two examples. In general, clinical labels are easier to obtain in larger quantities because they are regularly collected during routine clinical care, while biomarker labels require expert analysis and interpretation to obtain. Within the field of ophthalmology, previous work has shown that clinical values exhibit correlations with biomarker structures that manifest within optical coherence tomography (OCT) scans. We exploit this relationship by using the clinical data as pseudo-labels for our data without biomarker labels in order to choose positive and negative instances for training a backbone network with a supervised contrastive loss. In this way, a backbone network learns a representation space that aligns with the clinical data distribution available. Afterwards, we fine-tune the network trained in this manner with the smaller amount of biomarker labeled data with a cross-entropy loss in order to classify these key indicators of disease directly from OCT scans. We also expand on this concept by proposing a method that uses a linear combination of clinical contrastive losses. We benchmark our methods against state of the art self-supervised methods in a novel setting with biomarkers of varying granularity. We show performance improvements by as much as 5\% in total biomarker detection AUROC.
翻译:本文提出了一种基于临床数据可提取标签的正负样本集选择策略,用于医学图像的对比学习。在医学领域,数据存在多种标签,它们在诊疗过程的不同阶段服务于不同目的。临床标签和生物标志物标签是其中两类典型。通常,临床标签由于在常规临床诊疗中定期采集而更容易获得大量数据,而生物标志物标签则需要专家分析和解读才能获取。在眼科领域,已有研究表明临床数值与光学相干断层扫描(OCT)中显现的生物标志物结构存在相关性。我们利用这一关系,将临床数据作为无生物标志物标签数据的伪标签,从而选择正负样本实例,通过有监督对比损失训练骨干网络。这样,骨干网络能够学习一个与现有临床数据分布一致的表示空间。随后,我们使用少量带有生物标志物标签的数据,通过交叉熵损失对以此方式训练的网络进行微调,从而直接根据OCT扫描结果对这些关键疾病指标进行分类。此外,我们进一步拓展这一概念,提出了一种使用临床对比损失线性组合的方法。我们在一种包含不同粒度生物标志物的新场景中,将我们的方法与当前最先进的自监督方法进行基准测试。结果表明,我们的方法在生物标志物总检测AUROC上实现了高达5%的性能提升。