The Swin transformer has recently attracted attention in medical image analysis due to its computational efficiency and long-range modeling capability. Owing to these properties, the Swin Transformer is suitable for establishing more distant relationships between corresponding voxels in different positions in complex abdominal image registration tasks. However, the registration models based on transformers combine multiple voxels into a single semantic token. This merging process limits the transformers to model and generate coarse-grained spatial information. To address this issue, we propose Recovery Feature Resolution Network (RFRNet), which allows the transformer to contribute fine-grained spatial information and rich semantic correspondences to higher resolution levels. Furthermore, shifted window partitioning operations are inflexible, indicating that they cannot perceive the semantic information over uncertain distances and automatically bridge the global connections between windows. Therefore, we present a Weighted Window Attention (WWA) to build global interactions between windows automatically. It is implemented after the regular and cyclic shift window partitioning operations within the Swin transformer block. The proposed unsupervised deformable image registration model, named RFR-WWANet, detects the long-range correlations, and facilitates meaningful semantic relevance of anatomical structures. Qualitative and quantitative results show that RFR-WWANet achieves significant improvements over the current state-of-the-art methods. Ablation experiments demonstrate the effectiveness of the RFRNet and WWA designs. Our code is available at \url{https://github.com/MingR-Ma/RFR-WWANet}.
翻译:Swin Transformer因其计算效率与长程建模能力,近年来在医学图像分析领域备受关注。基于这些特性,Swin Transformer适用于在复杂腹部图像配准任务中建立不同位置对应体素间的更远距离关联。然而,基于Transformer的配准模型将多个体素合并为单一语义标记,这种合并过程限制了Transformer对粗粒度空间信息的建模与生成能力。针对此问题,我们提出恢复特征分辨率网络(RFRNet),该网络使Transformer能够向更高分辨率层级贡献细粒度空间信息与丰富的语义对应关系。此外,移位窗口划分操作缺乏灵活性,导致其无法感知不确定距离内的语义信息,也无法自动建立窗口间的全局连接。为此,我们提出加权窗口注意力(WWA)机制,用于自动构建窗口间的全局交互。该机制在Swin Transformer模块内的常规与循环移位窗口划分操作之后实现。所提出的无监督变形图像配准模型命名为RFR-WWANet,该模型可检测长程相关性,并促进解剖结构间有意义的语义关联。定性与定量结果表明,RFR-WWANet相较于当前最先进方法取得了显著改进。消融实验验证了RFRNet与WWA设计的有效性。我们的代码开源地址为:\url{https://github.com/MingR-Ma/RFR-WWANet}。