MS-Twins: Multi-Scale Deep Self-Attention Networks for Medical Image Segmentation

Although transformer is preferred in natural language processing, few studies have applied it in the field of medical imaging. For its long-term dependency, the transformer is expected to contribute to unconventional convolution neural net conquer their inherent spatial induction bias. The lately suggested transformer-based partition method only uses the transformer as an auxiliary module to help encode the global context into a convolutional representation. There is hardly any study about how to optimum bond self-attention (the kernel of transformers) with convolution. To solve the problem, the article proposes MS-Twins (Multi-Scale Twins), which is a powerful segmentation model on account of the bond of self-attention and convolution. MS-Twins can better capture semantic and fine-grained information by combining different scales and cascading features. Compared with the existing network structure, MS-Twins has made significant progress on the previous method based on the transformer of two in common use data sets, Synapse and ACDC. In particular, the performance of MS-Twins on Synapse is 8% higher than SwinUNet. Even compared with nnUNet, the best entirely convoluted medical image segmentation network, the performance of MS-Twins on Synapse and ACDC still has a bit advantage.

翻译：尽管Transformer在自然语言处理领域备受青睐，但在医学影像领域的应用研究仍较为有限。由于具备长程依赖特性，Transformer有望突破传统卷积神经网络固有的空间归纳偏置局限。近期提出的基于Transformer的分割方法仅将其作为辅助模块，用于将全局上下文编码至卷积表征中，而关于如何最优地结合自注意力（Transformer核心机制）与卷积的研究几乎空白。针对该问题，本文提出MS-Twins（多尺度双胞胎网络），这是一种基于自注意力与卷积结合的高效分割模型。MS-Twins通过融合不同尺度与级联特征，能够更有效地捕获语义信息与细粒度特征。与现有网络结构相比，MS-Twins在Synapse和ACDC两个常用数据集上均超越了此前基于Transformer的方法，尤其在Synapse数据集上性能较SwinUNet提升8%。即便与最优的全卷积医学图像分割网络nnUNet相比，MS-Twins在Synapse和ACDC数据集上仍保持微弱优势。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日