RetSeg: Retention-based Colorectal Polyps Segmentation Network

Vision Transformers (ViTs) have revolutionized medical imaging analysis, showcasing superior efficacy compared to conventional Convolutional Neural Networks (CNNs) in vital tasks such as polyp classification, detection, and segmentation. Leveraging attention mechanisms to focus on specific image regions, ViTs exhibit contextual awareness in processing visual data, culminating in robust and precise predictions, even for intricate medical images. Moreover, the inherent self-attention mechanism in Transformers accommodates varying input sizes and resolutions, granting an unprecedented flexibility absent in traditional CNNs. However, Transformers grapple with challenges like excessive memory usage and limited training parallelism due to self-attention, rendering them impractical for real-time disease detection on resource-constrained devices. In this study, we address these hurdles by investigating the integration of the recently introduced retention mechanism into polyp segmentation, introducing RetSeg, an encoder-decoder network featuring multi-head retention blocks. Drawing inspiration from Retentive Networks (RetNet), RetSeg is designed to bridge the gap between precise polyp segmentation and resource utilization, particularly tailored for colonoscopy images. We train and validate RetSeg for polyp segmentation employing two publicly available datasets: Kvasir-SEG and CVC-ClinicDB. Additionally, we showcase RetSeg's promising performance across diverse public datasets, including CVC-ColonDB, ETIS-LaribPolypDB, CVC-300, and BKAI-IGH NeoPolyp. While our work represents an early-stage exploration, further in-depth studies are imperative to advance these promising findings.

翻译：视觉Transformer（ViTs）已革新医学影像分析领域，在息肉分类、检测与分割等关键任务中展现出优于传统卷积神经网络（CNNs）的性能。通过利用注意力机制聚焦特定图像区域，ViTs在视觉数据处理中表现出上下文感知能力，即使在处理复杂医学图像时也能产生鲁棒且精准的预测。此外，Transformer中固有的自注意力机制可适应不同尺寸与分辨率的输入，赋予了传统CNNs所不具备的空前灵活性。然而，Transformer因自注意力机制面临内存占用过大与训练并行性受限等挑战，使其难以在资源受限设备上实现实时疾病检测。本研究通过探究将近期提出的保留机制整合至息肉分割领域，引入RetSeg——一种配备多头保留块的编码器-解码器网络。受Retentive Networks（RetNet）启发，RetSeg旨在弥合精准息肉分割与资源利用之间的鸿沟，尤其针对结肠镜图像进行定制。我们采用Kvasir-SEG与CVC-ClinicDB两个公开数据集对RetSeg进行训练与验证。此外，我们在CVC-ColonDB、ETIS-LaribPolypDB、CVC-300及BKAI-IGH NeoPolyp等多个公开数据集上展示了RetSeg的优异性能。尽管本研究属于早期探索阶段，但进一步深入研究对推进这些振奋人心的初步成果至关重要。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日