Vision Transformers (ViTs) have revolutionized medical imaging analysis, showcasing superior efficacy compared to conventional Convolutional Neural Networks (CNNs) in vital tasks such as polyp classification, detection, and segmentation. Leveraging attention mechanisms to focus on specific image regions, ViTs exhibit contextual awareness in processing visual data, culminating in robust and precise predictions, even for intricate medical images. Moreover, the inherent self-attention mechanism in Transformers accommodates varying input sizes and resolutions, granting an unprecedented flexibility absent in traditional CNNs. However, Transformers grapple with challenges like excessive memory usage and limited training parallelism due to self-attention, rendering them impractical for real-time disease detection on resource-constrained devices. In this study, we address these hurdles by investigating the integration of the recently introduced retention mechanism into polyp segmentation, introducing RetSeg, an encoder-decoder network featuring multi-head retention blocks. Drawing inspiration from Retentive Networks (RetNet), RetSeg is designed to bridge the gap between precise polyp segmentation and resource utilization, particularly tailored for colonoscopy images. We train and validate RetSeg for polyp segmentation employing two publicly available datasets: Kvasir-SEG and CVC-ClinicDB. Additionally, we showcase RetSeg's promising performance across diverse public datasets, including CVC-ColonDB, ETIS-LaribPolypDB, CVC-300, and BKAI-IGH NeoPolyp. While our work represents an early-stage exploration, further in-depth studies are imperative to advance these promising findings.
翻译:视觉Transformer(ViTs)已革新医学影像分析领域,在息肉分类、检测与分割等关键任务中展现出优于传统卷积神经网络(CNNs)的性能。通过利用注意力机制聚焦特定图像区域,ViTs在视觉数据处理中表现出上下文感知能力,即使在处理复杂医学图像时也能产生鲁棒且精准的预测。此外,Transformer中固有的自注意力机制可适应不同尺寸与分辨率的输入,赋予了传统CNNs所不具备的空前灵活性。然而,Transformer因自注意力机制面临内存占用过大与训练并行性受限等挑战,使其难以在资源受限设备上实现实时疾病检测。本研究通过探究将近期提出的保留机制整合至息肉分割领域,引入RetSeg——一种配备多头保留块的编码器-解码器网络。受Retentive Networks(RetNet)启发,RetSeg旨在弥合精准息肉分割与资源利用之间的鸿沟,尤其针对结肠镜图像进行定制。我们采用Kvasir-SEG与CVC-ClinicDB两个公开数据集对RetSeg进行训练与验证。此外,我们在CVC-ColonDB、ETIS-LaribPolypDB、CVC-300及BKAI-IGH NeoPolyp等多个公开数据集上展示了RetSeg的优异性能。尽管本研究属于早期探索阶段,但进一步深入研究对推进这些振奋人心的初步成果至关重要。