The data-intensive nature of supervised classification drives the interest of the researchers towards unsupervised approaches, especially for problems such as medical image segmentation, where labeled data is scarce. Building on the recent advancements of Vision transformers (ViT) in computer vision, we propose an unsupervised segmentation framework using a pre-trained Dino-ViT. In the proposed method, we leverage the inherent graph structure within the image to realize a significant performance gain for segmentation in medical images. For this, we introduce a modularity-based loss function coupled with a Graph Attention Network (GAT) to effectively capture the inherent graph topology within the image. Our method achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images. We demonstrate this using two challenging medical image datasets ISIC-2018 and CVC-ColonDB. This work underscores the potential of unsupervised approaches in advancing medical image analysis in scenarios where labeled data is scarce. The github repository of the code is available on [https://github.com/mudit-adityaja/UnSegMedGAT].
翻译:监督分类方法的数据密集型特性促使研究者将兴趣转向无监督方法,尤其对于医学图像分割这类标注数据稀缺的问题。基于视觉Transformer(ViT)在计算机视觉领域的最新进展,我们提出了一种利用预训练Dino-ViT的无监督分割框架。在该方法中,我们利用图像内在的图结构来实现医学图像分割的显著性能提升。为此,我们引入了一种基于模块度的损失函数,并结合图注意力网络(GAT)以有效捕捉图像内部的固有图拓扑结构。我们的方法取得了最先进的性能,甚至显著超越或匹配了现有(半)监督技术(如医学图像中的Segment Anything模型MedSAM)的表现。我们在ISIC-2018和CVC-ColonDB两个具有挑战性的医学图像数据集上验证了这一点。这项工作强调了在标注数据稀缺的场景下,无监督方法在推进医学图像分析方面的潜力。代码的GitHub仓库地址为[https://github.com/mudit-adityaja/UnSegMedGAT]。