Metadata-enhanced contrastive learning from retinal optical coherence tomography images

Robbie Holland,Oliver Leingang,Hrvoje Bogunović,Sophie Riedl,Lars Fritsche,Toby Prevost,Hendrik P. N. Scholl,Ursula Schmidt-Erfurth,Sobha Sivaprasad,Andrew J. Lotery,Daniel Rueckert,Martin J. Menten

Deep learning has potential to automate screening, monitoring and grading of disease in medical images. Pretraining with contrastive learning enables models to extract robust and generalisable features from natural image datasets, facilitating label-efficient downstream image analysis. However, the direct application of conventional contrastive methods to medical datasets introduces two domain-specific issues. Firstly, several image transformations which have been shown to be crucial for effective contrastive learning do not translate from the natural image to the medical image domain. Secondly, the assumption made by conventional methods, that any two images are dissimilar, is systematically misleading in medical datasets depicting the same anatomy and disease. This is exacerbated in longitudinal image datasets that repeatedly image the same patient cohort to monitor their disease progression over time. In this paper we tackle these issues by extending conventional contrastive frameworks with a novel metadata-enhanced strategy. Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships. To this end we employ records for patient identity, eye position (i.e. left or right) and time series information. In experiments using two large longitudinal datasets containing 170,427 retinal OCT images of 7,912 patients with age-related macular degeneration (AMD), we evaluate the utility of using metadata to incorporate the temporal dynamics of disease progression into pretraining. Our metadata-enhanced approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks related to AMD. Due to its modularity, our method can be quickly and cost-effectively tested to establish the potential benefits of including available metadata in contrastive pretraining.

翻译：深度学习在医学图像自动化筛查、监测和疾病分级方面具有巨大潜力。对比学习预训练使模型能够从自然图像数据集中提取稳健且可泛化的特征，从而促进标签高效的下游图像分析。然而，将传统对比方法直接应用于医学数据集会引发两个特定领域的问题。首先，一些已被证明对有效对比学习至关重要的图像变换无法从自然图像领域迁移到医学图像领域。其次，传统方法所假设的任意两张图像均不相似，在描绘相同解剖结构和疾病的医学数据集中存在系统性误导。这一问题在纵向图像数据集中尤为突出，此类数据集通过重复拍摄同一患者队列以监测其疾病随时间进展。本文通过扩展传统对比学习框架，提出一种新颖的元数据增强策略来解决这些问题。我们的方法利用广泛可得的患者元数据来近似真实的图像间对比关系集合。为此，我们采用患者身份、眼位（即左眼或右眼）和时间序列信息等记录。在使用两个大型纵向数据集（包含7,912名年龄相关性黄斑变性患者的170,427张视网膜OCT图像）的实验中，我们评估了利用元数据将疾病进展的时间动态纳入预训练的有效性。在六项与AMD相关的图像级下游任务中，我们的元数据增强方法在五项任务上均优于标准对比方法和视网膜图像基础模型。得益于其模块化特性，我们的方法可快速且经济高效地进行测试，以评估在对比预训练中纳入可用元数据的潜在效益。