Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks due to a better global information modeling from the self-attention mechanism. However, most methods are still designed for 2D medical images while ignoring the essential 3D volume information. The main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism \cite{vaswani2017attention}. In this paper, we propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity. Furthermore, we newly introduce a dynamic token concept to further reduce the token numbers for self-attention calculation. Taking advantage of the global information modeling, we provide uncertainty maps from different hierarchy stages. We evaluate this method on multiple challenging CT pancreas segmentation datasets. Our promising results show that our novel 3D Transformer-based segmentor could provide promising highly feasible segmentation performance and accurate uncertainty quantification using single annotation. Code is available https://github.com/freshman97/LinTransUNet.
翻译:基于Transformer的神经网络由于自注意力机制的全局信息建模能力,在许多生物医学图像分割任务中展现出优越性能。然而,现有方法大多针对二维医学图像设计,忽视了本质的三维体积信息。三维Transformer分割方法面临的主要挑战是自注意力机制引入的二次复杂度\cite{vaswani2017attention}。本文提出了一种采用编码器-解码器架构的新型Transformer架构,实现了线性复杂度下的三维医学图像分割。此外,我们创新性地引入动态令牌概念,进一步减少自注意力计算所需的令牌数量。利用全局信息建模优势,我们从不同层级阶段生成不确定性图。在多个具有挑战性的CT胰腺分割数据集上评估了该方法。实验结果表明,我们提出的新型三维Transformer分割器可在单次标注条件下,实现高度可行的分割性能与精确的不确定性量化。代码已开源至 https://github.com/freshman97/LinTransUNet。