Self-supervised learning (SSL) has been widely applied to learn image representations through exploiting unlabeled images. However, it has not been fully explored in the medical image analysis field. In this work, we propose Saliency-guided Self-Supervised image Transformer (SSiT) for diabetic retinopathy (DR) grading from fundus images. We novelly introduce saliency maps into SSL, with a goal of guiding self-supervised pre-training with domain-specific prior knowledge. Specifically, two saliency-guided learning tasks are employed in SSiT: (1) We conduct saliency-guided contrastive learning based on the momentum contrast, wherein we utilize fundus images' saliency maps to remove trivial patches from the input sequences of the momentum-updated key encoder. And thus, the key encoder is constrained to provide target representations focusing on salient regions, guiding the query encoder to capture salient features. (2) We train the query encoder to predict the saliency segmentation, encouraging preservation of fine-grained information in the learned representations. Extensive experiments are conducted on four publicly-accessible fundus image datasets. The proposed SSiT significantly outperforms other representative state-of-the-art SSL methods on all datasets and under various evaluation settings, establishing the effectiveness of the learned representations from SSiT. The source code is available at https://github.com/YijinHuang/SSiT.
翻译:自监督学习(SSL)已广泛应用于通过利用无标注图像来学习图像表示。然而,该技术在医学图像分析领域尚未得到充分探索。本文提出了一种基于显著性引导的自监督图像Transformer(SSiT),用于眼底图像中的糖尿病视网膜病变(DR)分级。我们创新性地将显著性图引入SSL,旨在以领域特定先验知识指导自监督预训练。具体而言,SSiT中采用了两种显著性引导的学习任务:(1)基于动量对比的显著性引导对比学习,其中利用眼底图像的显著性图,从动量更新的关键编码器的输入序列中移除无关图像块。由此,关键编码器被约束为提供聚焦于显著区域的目标表示,从而引导查询编码器捕获显著特征。(2)我们训练查询编码器预测显著性分割,以鼓励学习表示中保留细粒度信息。我们在四个公开可用的眼底图像数据集上进行了大量实验。实验结果表明,所提出的SSiT在所有数据集及多种评估设置下显著优于其他代表性先进SSL方法,验证了SSiT学习表示的有效性。源代码开源地址:https://github.com/YijinHuang/SSiT。