Vision Transformers (ViTs) have revolutionized medical imaging analysis, showcasing superior efficacy compared to conventional Convolutional Neural Networks (CNNs) in vital tasks such as polyp classification, detection, and segmentation. Leveraging attention mechanisms to focus on specific image regions, ViTs exhibit contextual awareness in processing visual data, culminating in robust and precise predictions, even for intricate medical images. Moreover, the inherent self-attention mechanism in Transformers accommodates varying input sizes and resolutions, granting an unprecedented flexibility absent in traditional CNNs. However, Transformers grapple with challenges like excessive memory usage and limited training parallelism due to self-attention, rendering them impractical for real-time disease detection on resource-constrained devices. In this study, we address these hurdles by investigating the integration of the recently introduced retention mechanism into polyp segmentation, introducing RetSeg, an encoder-decoder network featuring multi-head retention blocks. Drawing inspiration from Retentive Networks (RetNet), RetSeg is designed to bridge the gap between precise polyp segmentation and resource utilization, particularly tailored for colonoscopy images. We train and validate RetSeg for polyp segmentation employing two publicly available datasets: Kvasir-SEG and CVC-ClinicDB. Additionally, we showcase RetSeg's promising performance across diverse public datasets, including CVC-ColonDB, ETIS-LaribPolypDB, CVC-300, and BKAI-IGH NeoPolyp. While our work represents an early-stage exploration, further in-depth studies are imperative to advance these promising findings.
翻译:视觉Transformer(ViTs)已革新医学影像分析领域,在息肉分类、检测与分割等关键任务中展现出优于传统卷积神经网络(CNNs)的性能。通过利用注意力机制聚焦特定图像区域,ViT在处理视觉数据时具备上下文感知能力,从而为复杂的医学图像提供稳健且精准的预测。此外,Transformer固有的自注意力机制能够适应不同尺寸和分辨率的输入,赋予了传统CNN所缺乏的前所未有的灵活性。然而,Transformer因自注意力机制面临内存占用过高和训练并行性受限等挑战,使其难以在资源受限设备上实现实时疾病检测。在本研究中,我们通过探究将近期提出的保留机制整合至息肉分割领域以应对这些挑战,提出了RetSeg——一种包含多头保留块的编码器-解码器网络。受保留网络(RetNet)启发,RetSeg旨在弥合精准息肉分割与资源利用之间的鸿沟,特别针对结肠镜图像进行定制。我们使用两个公开数据集Kvasir-SEG和CVC-ClinicDB对RetSeg进行息肉分割训练与验证。此外,我们展示了RetSeg在多个公开数据集(包括CVC-ColonDB、ETIS-LaribPolypDB、CVC-300和BKAI-IGH NeoPolyp)上的优异性能。尽管本工作属于早期探索阶段,但仍需开展更深入的研究以推进这些前景广阔的研究发现。