Accurate and computationally efficient 3D medical image segmentation remains a critical challenge in clinical workflows. Transformer-based architectures often demonstrate superior global contextual modeling but at the expense of excessive parameter counts and memory demands, restricting their clinical deployment. We propose RefineFormer3D, a lightweight hierarchical transformer architecture that balances segmentation accuracy and computational efficiency for volumetric medical imaging. The architecture integrates three key components: (i) GhostConv3D-based patch embedding for efficient feature extraction with minimal redundancy, (ii) MixFFN3D module with low-rank projections and depthwise convolutions for parameter-efficient feature extraction, and (iii) a cross-attention fusion decoder enabling adaptive multi-scale skip connection integration. RefineFormer3D contains only 2.94M parameters, substantially fewer than contemporary transformer-based methods. Extensive experiments on ACDC and BraTS benchmarks demonstrate that RefineFormer3D achieves 93.44\% and 85.9\% average Dice scores respectively, outperforming or matching state-of-the-art methods while requiring significantly fewer parameters. Furthermore, the model achieves fast inference (8.35 ms per volume on GPU) with low memory requirements, supporting deployment in resource-constrained clinical environments. These results establish RefineFormer3D as an effective and scalable solution for practical 3D medical image segmentation.
翻译:精确且计算高效的三维医学图像分割在临床工作流程中仍是一个关键挑战。基于Transformer的架构通常展现出卓越的全局上下文建模能力,但代价是参数量过大和内存需求过高,限制了其临床部署。我们提出了RefineFormer3D,一种轻量级分层Transformer架构,旨在为三维医学影像在分割精度与计算效率之间取得平衡。该架构集成了三个关键组件:(i) 基于GhostConv3D的块嵌入,能以最小冗余实现高效特征提取;(ii) 采用低秩投影和深度卷积的MixFFN3D模块,用于参数高效的特征提取;(iii) 交叉注意力融合解码器,可实现自适应的多尺度跳跃连接集成。RefineFormer3D仅包含2.94M参数,远少于当前基于Transformer的方法。在ACDC和BraTS基准数据集上进行的大量实验表明,RefineFormer3D分别取得了93.44%和85.9%的平均Dice分数,优于或媲美现有最先进方法,同时所需参数量显著减少。此外,该模型推理速度快(GPU上每体积8.35毫秒),内存需求低,支持在资源受限的临床环境中部署。这些结果确立了RefineFormer3D作为一种有效且可扩展的实用三维医学图像分割解决方案。