Self-supervised learning methods based on data augmentations, such as SimCLR, BYOL, or DINO, allow obtaining semantically meaningful representations of image datasets and are widely used prior to supervised fine-tuning. A recent self-supervised learning method, $t$-SimCNE, uses contrastive learning to directly train a 2D representation suitable for visualisation. When applied to natural image datasets, $t$-SimCNE yields 2D visualisations with semantically meaningful clusters. In this work, we used $t$-SimCNE to visualise medical image datasets, including examples from dermatology, histology, and blood microscopy. We found that increasing the set of data augmentations to include arbitrary rotations improved the results in terms of class separability, compared to data augmentations used for natural images. Our 2D representations show medically relevant structures and can be used to aid data exploration and annotation, improving on common approaches for data visualisation.
翻译:基于数据扩增的自我监督学习方法(如SimCLR、BYOL或DINO),能够获取图像数据集具有语义意义的表征,并广泛应用于监督微调之前。最新提出的自我监督学习方法$t$-SimCNE利用对比学习直接训练适用于可视化的二维表征。当应用于自然图像数据集时,$t$-SimCNE能生成具有语义聚类效果的二维可视化结果。本研究将$t$-SimCNE应用于医学影像数据集的可视化,涵盖皮肤科、组织学及血液显微镜等领域的样本。研究发现:相较于自然图像使用的数据扩增策略,对数据扩增集增加任意旋转操作能显著提升类别可分性。本方法生成的二维表征能够呈现医学相关结构特征,可辅助数据探索与标注工作,相较传统数据可视化方法具有显著优势。