Medical imaging segmentation plays a significant role in the automatic recognition and analysis of lesions. State-of-the-art methods, particularly those utilizing transformers, have been prominently adopted in 3D semantic segmentation due to their superior performance in scalability and generalizability. However, plain vision transformers encounter challenges due to their neglect of local features and their high computational complexity. To address these challenges, we introduce three key contributions: Firstly, we proposed SegStitch, an innovative architecture that integrates transformers with denoising ODE blocks. Instead of taking whole 3D volumes as inputs, we adapt axial patches and customize patch-wise queries to ensure semantic consistency. Additionally, we conducted extensive experiments on the BTCV and ACDC datasets, achieving improvements up to 11.48% and 6.71% respectively in mDSC, compared to state-of-the-art methods. Lastly, our proposed method demonstrates outstanding efficiency, reducing the number of parameters by 36.7% and the number of FLOPS by 10.7% compared to UNETR. This advancement holds promising potential for adapting our method to real-world clinical practice. The code will be available at https://github.com/goblin327/SegStitch
翻译:医学影像分割在病灶的自动识别与分析中扮演着重要角色。得益于其在可扩展性和泛化性方面的卓越性能,基于Transformer的先进方法在三维语义分割领域已得到广泛应用。然而,传统的视觉Transformer由于忽视局部特征且计算复杂度高而面临挑战。为应对这些挑战,本文提出三项关键贡献:首先,我们提出了SegStitch——一种将Transformer与去噪常微分方程模块相融合的创新架构。该方法不直接处理完整三维体数据,而是采用轴向切片并定制切片级查询以保障语义一致性。此外,我们在BTCV和ACDC数据集上进行了大量实验,相较于现有最优方法,在mDSC指标上分别实现了最高11.48%和6.71%的性能提升。最后,所提方法展现出卓越的效率,与UNETR相比参数量减少36.7%,计算量(FLOPS)降低10.7%。这一进展为该方法适配实际临床场景展现出巨大潜力。代码将在https://github.com/goblin327/SegStitch 公开。