Current medical image segmentation approaches have limitations in deeply exploring multi-scale information and effectively combining local detail textures with global contextual semantic information. This results in over-segmentation, under-segmentation, and blurred segmentation boundaries. To tackle these challenges, we explore multi-scale feature representations from different perspectives, proposing a novel, lightweight, and multi-scale architecture (LM-Net) that integrates advantages of both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to enhance segmentation accuracy. LM-Net employs a lightweight multi-branch module to capture multi-scale features at the same level. Furthermore, we introduce two modules to concurrently capture local detail textures and global semantics with multi-scale features at different levels: the Local Feature Transformer (LFT) and Global Feature Transformer (GFT). The LFT integrates local window self-attention to capture local detail textures, while the GFT leverages global self-attention to capture global contextual semantics. By combining these modules, our model achieves complementarity between local and global representations, alleviating the problem of blurred segmentation boundaries in medical image segmentation. To evaluate the feasibility of LM-Net, extensive experiments have been conducted on three publicly available datasets with different modalities. Our proposed model achieves state-of-the-art results, surpassing previous methods, while only requiring 4.66G FLOPs and 5.4M parameters. These state-of-the-art results on three datasets with different modalities demonstrate the effectiveness and adaptability of our proposed LM-Net for various medical image segmentation tasks.
翻译:当前医学图像分割方法在深入探索多尺度信息以及有效结合局部细节纹理与全局上下文语义信息方面存在局限,导致过分割、欠分割及分割边界模糊等问题。为应对这些挑战,我们从不同视角探索多尺度特征表示,提出了一种新颖的轻量级多尺度架构(LM-Net),该架构融合了卷积神经网络(CNN)与视觉Transformer(ViT)的优势以提升分割精度。LM-Net采用轻量级多分支模块在同一层级捕获多尺度特征。此外,我们引入了两个模块以在不同层级上结合多尺度特征同时捕获局部细节纹理与全局语义:局部特征Transformer(LFT)与全局特征Transformer(GFT)。LFT通过集成局部窗口自注意力机制捕获局部细节纹理,而GFT利用全局自注意力机制捕获全局上下文语义。通过组合这些模块,我们的模型实现了局部与全局表征的互补,缓解了医学图像分割中边界模糊的问题。为评估LM-Net的可行性,我们在三个公开的不同模态数据集上进行了广泛实验。所提出的模型取得了最先进的结果,超越了先前方法,且仅需4.66G FLOPs与5.4M参数量。在三个不同模态数据集上取得的最先进结果,证明了我们提出的LM-Net在各种医学图像分割任务中的有效性与适应性。