Transformers have achieved significant success in medical image segmentation, owing to its capability to capture long-range dependencies. Previous works incorporate convolutional layers into the encoder module of transformers, thereby enhancing their ability to learn local relationships among pixels. However, transformers may suffer from limited generalization capabilities and reduced robustness, attributed to the insufficient spatial recovery ability of their decoders. To address this issue, A convolution sparse vector coding based decoder is proposed , namely CAScaded multi-layer Convolutional Sparse vector Coding DEcoder (CASCSCDE), which represents features extracted by the encoder using sparse vectors. To prove the effectiveness of our CASCSCDE, The widely-used TransUNet model is chosen for the demonstration purpose, and the CASCSCDE is incorporated with TransUNet to establish the TransCASCSCDE architecture. Our experiments demonstrate that TransUNet with CASCSCDE significantly enhances performance on the Synapse benchmark, obtaining up to 3.15\% and 1.16\% improvements in DICE and mIoU scores, respectively. CASCSCDE opens new ways for constructing decoders based on convolutional sparse vector coding.
翻译:Transformer在医学图像分割领域取得了显著成功,这得益于其捕获长距离依赖关系的能力。以往的研究将卷积层引入Transformer的编码器模块,从而增强了其学习像素间局部关系的能力。然而,由于解码器的空间恢复能力不足,Transformer可能面临泛化能力有限和鲁棒性降低的问题。为解决这一问题,本文提出了一种基于卷积稀疏向量编码的解码器——级联多层卷积稀疏向量编码解码器(CASCSCDE),该解码器利用稀疏向量表示编码器提取的特征。为验证CASCSCDE的有效性,我们选取广泛使用的TransUNet模型作为演示对象,并将CASCSCDE与TransUNet相结合,构建了TransCASCSCDE架构。实验结果表明,融合CASCSCDE的TransUNet在Synapse基准测试中性能显著提升,DICE和mIoU分数分别提高了3.15%和1.16%。CASCSCDE为基于卷积稀疏向量编码构建解码器开辟了新途径。