SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

The adoption of Vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex architectures with large scale computing resources for training and deployment. Furthermore, in the context of limited datasets, often encountered in medical imaging, larger models can present hurdles in both model generalization and convergence. In response to these challenges and to demonstrate that lightweight models are a valuable area of research in 3D medical imaging, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. Additionally, SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features to produce highly accurate segmentation masks. The proposed memory efficient Transformer preserves the performance characteristics of a significantly larger model in a compact design. SegFormer3D democratizes deep learning for 3D medical image segmentation by offering a model with 33x less parameters and a 13x reduction in GFLOPS compared to the current state-of-the-art (SOTA). We benchmark SegFormer3D against the current SOTA models on three widely used datasets Synapse, BRaTs, and ACDC, achieving competitive results. Code: https://github.com/OSUPCVLab/SegFormer3D.git

翻译：基于视觉Transformer（ViTs）的架构的采用代表了三维医学图像分割领域的一项重大进展，通过增强全局上下文理解能力超越了传统卷积神经网络模型。尽管这种范式转变显著提升了三维分割性能，但现有最先进的架构需要极为庞大且复杂的结构以及大规模计算资源进行训练和部署。此外，在医学影像中常见的小规模数据集背景下，更大的模型可能在模型泛化和收敛方面带来挑战。针对这些问题，并为了证明轻量化模型是三维医学成像领域有价值的研究方向，我们提出了SegFormer3D——一种层次化Transformer，可跨多尺度体素特征计算注意力。此外，SegFormer3D避免了复杂的解码器，采用全MLP解码器聚合局部与全局注意力特征，以生成高精度的分割掩膜。所提出的内存高效Transformer在紧凑设计中保留了显著更大模型的性能特征。SegFormer3D通过提供相比当前最先进模型参数减少33倍、GFLOPS降低13倍的模型，推动了三维医学图像分割深度学习的普及化。我们在三个广泛使用的数据集（Synapse、BRaTs和ACDC）上对SegFormer3D与当前最先进模型进行了基准测试，取得了竞争性的结果。代码：https://github.com/OSUPCVLab/SegFormer3D.git

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日