The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.
翻译:视觉Transformer(ViT)的出现为三维体素基准测试带来了显著进展,尤其在三维医学图像分割领域。与此同时,多层感知器(MLP)网络因在排除繁重的自注意力模块后仍能达到与ViT相当的结果,重新受到研究者青睐。本文提出一种用于三维医学图像分割的可置换混合网络,命名为PHNet,该网络融合了卷积神经网络(CNN)和MLP的优势。PHNet通过同时利用2D和3D CNN提取局部信息,解决了三维体素数据固有的各向同性难题。此外,我们提出了一种高效的多层置换感知器模块,即MLPP,该模块通过获取长程依赖关系并保留位置信息,增强了原始MLP。大量实验结果表明,PHNet在两个公开数据集(COVID-19-20和Synapse)上均优于现有最先进方法。消融研究进一步证实了PHNet在发挥CNN和MLP优势方面的有效性。代码将在论文被接收后公开发布。