Medical image segmentation, a crucial task in computer vision, facilitates the automated delineation of anatomical structures and pathologies, supporting clinicians in diagnosis, treatment planning, and disease monitoring. Notably, transformers employing shifted window-based self-attention have demonstrated exceptional performance. However, their reliance on local window attention limits the fusion of local and global contextual information, crucial for segmenting microtumors and miniature organs. To address this limitation, we propose the Adaptive Semantic Segmentation Network (ASSNet), a transformer architecture that effectively integrates local and global features for precise medical image segmentation. ASSNet comprises a transformer-based U-shaped encoder-decoder network. The encoder utilizes shifted window self-attention across five resolutions to extract multi-scale features, which are then propagated to the decoder through skip connections. We introduce an augmented multi-layer perceptron within the encoder to explicitly model long-range dependencies during feature extraction. Recognizing the constraints of conventional symmetrical encoder-decoder designs, we propose an Adaptive Feature Fusion (AFF) decoder to complement our encoder. This decoder incorporates three key components: the Long Range Dependencies (LRD) block, the Multi-Scale Feature Fusion (MFF) block, and the Adaptive Semantic Center (ASC) block. These components synergistically facilitate the effective fusion of multi-scale features extracted by the decoder while capturing long-range dependencies and refining object boundaries. Comprehensive experiments on diverse medical image segmentation tasks, including multi-organ, liver tumor, and bladder tumor segmentation, demonstrate that ASSNet achieves state-of-the-art results. Code and models are available at: \url{https://github.com/lzeeorno/ASSNet}.
翻译:医学图像分割作为计算机视觉领域的关键任务,能够实现解剖结构与病理区域的自动勾画,为临床医生的诊断、治疗规划及疾病监测提供支持。值得注意的是,采用移位窗口自注意力机制的Transformer模型已展现出卓越性能。然而,其依赖局部窗口注意力的特性限制了局部与全局上下文信息的融合,而这对微小肿瘤及微型器官的分割至关重要。为克服这一局限,我们提出自适应语义分割网络(ASSNet),这是一种能有效整合局部与全局特征以实现精确医学图像分割的Transformer架构。ASSNet包含基于Transformer的U形编码器-解码器网络。编码器在五个分辨率层级上采用移位窗口自注意力机制提取多尺度特征,并通过跳跃连接传递至解码器。我们在编码器中引入增强型多层感知器,以在特征提取过程中显式建模长程依赖关系。针对传统对称编码器-解码器设计的局限性,我们提出自适应特征融合(AFF)解码器作为编码器的补充。该解码器包含三个核心组件:长程依赖(LRD)模块、多尺度特征融合(MFF)模块以及自适应语义中心(ASC)模块。这些组件协同工作,在捕获长程依赖关系并优化目标边界的同时,有效融合解码器提取的多尺度特征。在多器官分割、肝脏肿瘤分割及膀胱肿瘤分割等多样化医学图像分割任务上的综合实验表明,ASSNet取得了最先进的性能。代码与模型已发布于:\url{https://github.com/lzeeorno/ASSNet}。