We adapt the remote sensing-inspired AMBER model from multi-band image segmentation to 3D medical datacube segmentation. To address the computational bottleneck of the volumetric transformer, we propose the AMBER-AFNO architecture. This approach uses Adaptive Fourier Neural Operators (AFNO) instead of the multi-head self-attention mechanism. Unlike spatial pairwise interactions between tokens, global token mixing in the frequency domain avoids $\mathcal{O}(N^2)$ attention-weight calculations. As a result, AMBER-AFNO achieves quasi-linear computational complexity and linear memory scaling. This new way to model global context reduces reliance on dense transformers while preserving global contextual modeling capability. By using attention-free spectral operations, our design offers a compact parameterization and maintains a competitive computational complexity. We evaluate AMBER-AFNO on three public datasets: ACDC, Synapse, and BraTS. On these datasets, the model achieves state-of-the-art or near-state-of-the-art results for DSC and HD95. Compared with recent compact CNN and Transformer architectures, our approach yields higher Dice scores while maintaining a compact model size. Overall, our results show that frequency-domain token mixing with AFNO provides a fast and efficient alternative to self-attention mechanisms for 3D medical image segmentation.
翻译:我们将受遥感启发的AMBER模型从多波段图像分割任务迁移至三维医学数据立方体分割任务。为解决体积Transformer的计算瓶颈,我们提出了AMBER-AFNO架构。该方法采用自适应傅里叶神经算子(AFNO)替代多头自注意力机制。与token间的空间成对交互不同,频域中的全局token混合避免了$\mathcal{O}(N^2)$级别的注意力权重计算。因此,AMBER-AFNO实现了拟线性计算复杂度与线性内存扩展。这种建模全局上下文的新方法在保持全局上下文建模能力的同时,降低了对密集Transformer的依赖。通过使用无需注意力的谱运算,我们的设计提供了紧凑的参数化方案并保持了具有竞争力的计算复杂度。我们在三个公开数据集(ACDC、Synapse和BraTS)上评估AMBER-AFNO。在这些数据集上,该模型在DSC和HD95指标上取得了最优或接近最优的结果。与近期紧凑型CNN及Transformer架构相比,我们的方法在保持紧凑模型规模的同时获得了更高的Dice分数。总体而言,我们的研究结果表明,基于AFNO的频域token混合为三维医学图像分割提供了一种快速高效的自注意力机制替代方案。