TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation

The hybrid architecture of convolution neural networks (CNN) and Transformer has been the most popular method for medical image segmentation. However, the existing networks based on the hybrid architecture suffer from two problems. First, although the CNN branch can capture image local features by using convolution operation, the vanilla convolution is unable to achieve adaptive extraction of image features. Second, although the Transformer branch can model the global information of images, the conventional self-attention only focuses on the spatial self-attention of images and ignores the channel and cross-dimensional self-attention leading to low segmentation accuracy for medical images with complex backgrounds. To solve these problems, we propose vision Transformer embrace convolutional neural networks for medical image segmentation (TEC-Net). Our network has two advantages. First, dynamic deformable convolution (DDConv) is designed in the CNN branch, which not only overcomes the difficulty of adaptive feature extraction using fixed-size convolution kernels, but also solves the defect that different inputs share the same convolution kernel parameters, effectively improving the feature expression ability of CNN branch. Second, in the Transformer branch, a (shifted)-window adaptive complementary attention module ((S)W-ACAM) and compact convolutional projection are designed to enable the network to fully learn the cross-dimensional long-range dependency of medical images with few parameters and calculations. Experimental results show that the proposed TEC-Net provides better medical image segmentation results than SOTA methods including CNN and Transformer networks. In addition, our TEC-Net requires fewer parameters and computational costs and does not rely on pre-training. The code is publicly available at https://github.com/SR0920/TEC-Net.

翻译：卷积神经网络（CNN）与Transformer的混合架构已成为医学图像分割最主流的方法。然而，现有基于混合架构的网络存在两个问题：第一，虽然CNN分支可通过卷积操作捕获图像局部特征，但标准卷积无法实现图像特征的自适应提取；第二，尽管Transformer分支能建模图像全局信息，但传统自注意力机制仅关注图像的空间自注意力，忽略通道及跨维度自注意力，导致对复杂背景医学图像的分割精度较低。针对这些问题，我们提出视觉Transformer融合卷积神经网络用于医学图像分割（TEC-Net）。本网络具有两大优势：首先，在CNN分支中设计动态可变形卷积（DDConv），既克服了固定尺寸卷积核难以实现自适应特征提取的难题，又解决了不同输入共享相同卷积核参数的缺陷，有效提升了CNN分支的特征表达能力；其次，在Transformer分支中设计（移位）窗口自适应互补注意力模块（(S)W-ACAM）与紧凑卷积投影，使网络能以较少参数量和计算量充分学习医学图像的跨维度长程依赖关系。实验结果表明，所提出的TEC-Net在医学图像分割效果上优于包括CNN和Transformer网络在内的现有最先进方法。此外，TEC-Net所需参数量和计算成本更少，且无需依赖预训练。代码已开源：https://github.com/SR0920/TEC-Net。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日