Given the close association between colorectal cancer and polyps, the diagnosis and identification of colorectal polyps play a critical role in the detection and surgical intervention of colorectal cancer. In this context, the automatic detection and segmentation of polyps from various colonoscopy images has emerged as a significant problem that has attracted broad attention. Current polyp segmentation techniques face several challenges: firstly, polyps vary in size, texture, color, and pattern; secondly, the boundaries between polyps and mucosa are usually blurred, existing studies have focused on learning the local features of polyps while ignoring the long-range dependencies of the features, and also ignoring the local context and global contextual information of the combined features. To address these challenges, we propose FLDNet (Foreground-Long-Distance Network), a Transformer-based neural network that captures long-distance dependencies for accurate polyp segmentation. Specifically, the proposed model consists of three main modules: a pyramid-based Transformer encoder, a local context module, and a foreground-Aware module. Multilevel features with long-distance dependency information are first captured by the pyramid-based transformer encoder. On the high-level features, the local context module obtains the local characteristics related to the polyps by constructing different local context information. The coarse map obtained by decoding the reconstructed highest-level features guides the feature fusion process in the foreground-Aware module of the high-level features to achieve foreground enhancement of the polyps. Our proposed method, FLDNet, was evaluated using seven metrics on common datasets and demonstrated superiority over state-of-the-art methods on widely-used evaluation measures.
翻译:鉴于结直肠癌与息肉之间的密切关联,结直肠息肉的诊断与识别在结直肠癌的检测及手术干预中起着关键作用。在此背景下,从各类结肠镜图像中自动检测与分割息肉已成为一个备受关注的重要问题。当前的息肉分割技术面临若干挑战:首先,息肉的尺寸、纹理、颜色及形态各异;其次,息肉与黏膜之间的边界通常模糊不清,现有研究主要关注学习息肉的局部特征,而忽略了特征的长距离依赖关系,同时未结合局部上下文与全局上下文信息。为解决这些问题,我们提出FLDNet(前景-长距离网络),一种基于Transformer的神经网络,通过捕获长距离依赖关系实现精准息肉分割。具体而言,该模型包含三个主要模块:基于金字塔的Transformer编码器、局部上下文模块和前景感知模块。首先,基于金字塔的Transformer编码器捕获具有长距离依赖信息的多层级特征。在高层次特征上,局部上下文模块通过构建不同的局部上下文信息获取与息肉相关的局部特征。通过重构最高层级特征解码获得的粗分割图,指导前景感知模块中高层次特征的融合过程,从而实现息肉的前景增强。所提出的FLDNet方法在常用数据集上使用七项评估指标进行验证,并在广泛采用的评价标准上展现出优于现有最先进方法的性能。