CUTS: A Fully Unsupervised Framework for Medical Image Segmentation

In this work we introduce CUTS (Contrastive and Unsupervised Training for Segmentation), a fully unsupervised deep learning framework for medical image segmentation to better utilize the vast majority of imaging data that is not labeled or annotated. We utilize self-supervision from pixels and their local neighborhoods in the images themselves. Our unsupervised approach optimizes a training objective that leverages concepts from contrastive learning and autoencoding. Our framework segments medical images with a novel two-stage approach without relying on any labeled data at any stage. The first stage involves the creation of a "pixel-centered patch" that embeds every pixel along with its surrounding patch, using a vector representation in a high-dimensional latent embedding space. The second stage utilizes diffusion condensation, a multi-scale topological data analysis approach, to dynamically coarse-grain these embedding vectors at all levels of granularity. The final outcome is a series of coarse-to-fine segmentations that highlight image structures at various scales. In this work, we show successful multi-scale segmentation on natural images, retinal fundus images, and brain MRI images. Our framework delineates structures and patterns at different scales which, in the cases of medical images, may carry distinct information relevant to clinical interpretation. Quantitatively, our framework demonstrates improvements ranging from 10% to 200% on dice coefficient and Hausdorff distance compared to existing unsupervised methods across three medical image datasets. As we tackle the problem of segmenting medical images at multiple meaningful granularities without relying on any label, we hope to demonstrate the possibility to circumvent tedious and repetitive manual annotations in future practice.

翻译：在本文中，我们提出CUTS（对比与无监督训练分割框架），一种完全无监督的深度学习医学图像分割框架，旨在更好地利用大量未标注的影像数据。该方法利用像素及其局部邻域在图像本身中的自监督信息。我们的无监督方法优化了一个结合对比学习与自编码理念的训练目标。该框架采用新颖的两阶段方法实现医学图像分割，且任何阶段均不依赖标注数据。第一阶段构建“像素中心块”，将每个像素及其周围块嵌入高维潜在空间中的向量表示。第二阶段采用扩散凝聚（一种多尺度拓扑数据分析方法）动态粗粒化这些嵌入向量，覆盖所有粒度层级。最终输出一系列从粗到细的分割结果，突出显示不同尺度的图像结构。本文展示了该方法在自然图像、视网膜眼底图像及脑部磁共振图像上的成功多尺度分割。该框架可描绘不同尺度的结构与模式，对于医学图像而言，这些尺度可能包含与临床解读相关的不同信息。定量结果表明，与现有无监督方法相比，该框架在三个医学图像数据集上的Dice系数和豪斯多夫距离分别提升了10%至200%。通过解决在无标注条件下对医学图像进行多粒度有意义分割的问题，我们期望展示未来实践中规避繁琐且重复的人工标注的可能性。