Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. In recent years, Vision Transformers (ViTs) have emerged as a promising technique for addressing the challenges in medical image segmentation. In medical images, structures are usually highly interconnected and globally distributed. ViTs utilize their multi-scale attention mechanism to model the long-range relationships in the images. However, they do lack image-related inductive bias and translational invariance, potentially impacting their performance. Recently, researchers have come up with various ViT-based approaches that incorporate CNNs in their architectures, known as Hybrid Vision Transformers (HVTs) to capture local correlation in addition to the global information in the images. This survey paper provides a detailed review of the recent advancements in ViTs and HVTs for medical image segmentation. Along with the categorization of ViT and HVT-based medical image segmentation approaches we also present a detailed overview of their real-time applications in several medical image modalities. This survey may serve as a valuable resource for researchers, healthcare practitioners, and students in understanding the state-of-the-art approaches for ViT-based medical image segmentation.
翻译:医学图像分割在诸多医疗应用中发挥着关键作用,能够实现精准诊断、治疗规划与疾病监测。近年来,视觉Transformer(ViTs)已成为解决医学图像分割挑战的前沿技术。在医学图像中,结构通常具有高度互联性和全局分布性。ViT利用其多尺度注意力机制来建模图像中的长距离关联关系。然而,它们缺乏图像相关的归纳偏置与平移不变性,可能影响其性能。近期,研究人员提出了多种基于ViT的方法,通过在其架构中集成卷积神经网络(CNN),即混合视觉Transformer(HVTs),从而在捕获图像全局信息的同时获取局部相关性。本综述论文详细回顾了ViT和HVT在医学图像分割领域的最新进展。除了对基于ViT和HVT的医学图像分割方法进行分类外,我们还对其在多种医学图像模态下的实际应用进行了详尽概述。本综述可作为研究人员、医疗从业者及学生理解基于ViT的医学图像分割前沿方法的重要参考资源。