Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attention along the horizontal and vertical directions sequentially, we propose to calculate dual cross attentions between two parallel axial attentions to capture global information better. To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information. We build the proposed MCA upon the MSCAN backbone, yielding our network, termed MCANet. Our MCANet with only 4M+ parameters performs even better than most previous works with heavy backbones (e.g., Swin Transformer) on four challenging tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. Code is available at https:// github.com/ haoshao-nku/ medical seg.git.
翻译:高效捕获多尺度信息并建立像素间的长程依赖关系,对于医学图像分割至关重要,因为病变区域或器官在尺寸和形状上存在显著差异。本文提出多尺度跨轴注意力机制(MCA),基于高效的轴向注意力解决上述挑战性问题。不同于沿水平和垂直方向依次简单连接轴向注意力,我们提出在两个并行轴向注意力之间计算双重交叉注意力,以更好捕获全局信息。为处理病变区域或器官在个体尺寸和形状上的显著变化,我们还在每个轴向注意力路径中使用不同核大小的条状卷积核进行多卷积操作,以提高所提MCA在编码空间信息方面的效率。我们将所提MCA构建于MSCAN骨干网络上,形成网络MCANet。我们的MCANet仅需4M+参数,在皮肤病变分割、细胞核分割、腹部多器官分割和息肉分割等四项具有挑战性任务上的表现,甚至优于多数使用重型骨干网络(如Swin Transformer)的以往工作。代码已发布于https://github.com/haoshao-nku/medicalseg.git。