Vision Transformers (ViTs) have revolutionized medical imaging analysis, showcasing superior efficacy compared to conventional Convolutional Neural Networks (CNNs) in vital tasks such as polyp classification, detection, and segmentation. Leveraging attention mechanisms to focus on specific image regions, ViTs exhibit contextual awareness in processing visual data, culminating in robust and precise predictions, even for intricate medical images. Moreover, the inherent self-attention mechanism in Transformers accommodates varying input sizes and resolutions, granting an unprecedented flexibility absent in traditional CNNs. However, Transformers grapple with challenges like excessive memory usage and limited training parallelism due to self-attention, rendering them impractical for real-time disease detection on resource-constrained devices. In this study, we address these hurdles by investigating the integration of the recently introduced retention mechanism into polyp segmentation, introducing RetSeg, an encoder-decoder network featuring multi-head retention blocks. Drawing inspiration from Retentive Networks (RetNet), RetSeg is designed to bridge the gap between precise polyp segmentation and resource utilization, particularly tailored for colonoscopy images. We train and validate RetSeg for polyp segmentation employing two publicly available datasets: Kvasir-SEG and CVC-ClinicDB. Additionally, we showcase RetSeg's promising performance across diverse public datasets, including CVC-ColonDB, ETIS-LaribPolypDB, CVC-300, and BKAI-IGH NeoPolyp. While our work represents an early-stage exploration, further in-depth studies are imperative to advance these promising findings.
翻译:视觉Transformer(ViT)已革新医学影像分析领域,在息肉分类、检测与分割等关键任务中展现出优于传统卷积神经网络(CNN)的卓越性能。通过利用注意力机制聚焦特定图像区域,ViT在处理视觉数据时具备上下文感知能力,即使面对复杂医学图像也能产生稳健而精确的预测。此外,Transformer中固有的自注意力机制支持不同尺寸和分辨率的输入,赋予了传统CNN所不具备的前所未有的灵活性。然而,Transformer面临自注意力带来的内存占用过高与训练并行性受限等挑战,使其在资源受限设备上难以实现实时疾病检测。在本研究中,我们通过引入近期提出的留存机制并整合至息肉分割领域,提出RetSeg——一种采用多头留存模块的编码器-解码器网络。受留存网络(RetNet)启发,RetSeg旨在弥合精准息肉分割与资源利用之间的鸿沟,尤其针对结肠镜图像设计。我们利用两个公开数据集Kvasir-SEG和CVC-ClinicDB对RetSeg进行息肉分割的训练与验证。此外,我们展示了RetSeg在包括CVC-ColonDB、ETIS-LaribPolypDB、CVC-300和BKAI-IGH NeoPolyp在内的多样公开数据集上的优秀性能。尽管本研究仍属初步探索阶段,后续深入工作对于推进这些令人鼓舞的发现至关重要。