The Transformer structures have been widely used in computer vision and have recently made an impact in the area of medical image registration. However, the use of Transformer in most registration networks is straightforward. These networks often merely use the attention mechanism to boost the feature learning as the segmentation networks do, but do not sufficiently design to be adapted for the registration task. In this paper, we propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation. The proposed ModeT naturally transforms the multi-head neighborhood attention relationship into the multi-coordinate relationship to model multiple motion modes. Then the competitive weighting module (CWM) fuses multiple deformation sub-fields to generate the resulting deformation field. Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets show that our method outperforms current state-of-the-art registration networks and Transformers, demonstrating the potential of our ModeT for the challenging non-rigid deformation estimation problem. The benchmarks and our code are publicly available at https://github.com/ZAX130/SmileCode.
翻译:Transformer结构已在计算机视觉领域得到广泛应用,近期也在医学图像配准领域产生了重要影响。然而,大多数配准网络对Transformer的使用较为直接,这些网络往往仅像分割网络那样利用注意力机制提升特征学习能力,而未针对配准任务进行充分设计适配。本文提出了一种新颖的运动分解Transformer(ModeT),通过充分挖掘Transformer结构在形变估计中的内在能力,显式建模多种运动模态。所提出的ModeT将多头邻域注意力关系自然转化为多坐标关系,从而建模多种运动模式。随后,竞争加权模块(CWM)融合多个形变子场生成最终形变场。在两个公开脑部磁共振成像(MRI)数据集上的大量实验表明,我们的方法优于当前最先进的配准网络和Transformer模型,展现了ModeT在挑战性非刚性形变估计问题中的潜力。基准数据集及代码已公开于https://github.com/ZAX130/SmileCode。