Current research on deep learning for medical image segmentation exposes their limitations in learning either global semantic information or local contextual information. To tackle these issues, a novel network named SegTransVAE is proposed in this paper. SegTransVAE is built upon encoder-decoder architecture, exploiting transformer with the variational autoencoder (VAE) branch to the network to reconstruct the input images jointly with segmentation. To the best of our knowledge, this is the first method combining the success of CNN, transformer, and VAE. Evaluation on various recently introduced datasets shows that SegTransVAE outperforms previous methods in Dice Score and $95\%$-Haudorff Distance while having comparable inference time to a simple CNN-based architecture network. The source code is available at: https://github.com/itruonghai/SegTransVAE.
翻译:当前基于深度学习的医学图像分割研究在同时学习全局语义信息和局部上下文信息方面存在局限性。为解决上述问题,本文提出一种名为SegTransVAE的新型网络。该网络基于编码器-解码器架构,创新性地引入Transformer并联合变分自编码器分支,通过图像重建与分割任务的协同训练实现性能提升。据我们所知,这是首个融合CNN、Transformer与变分自编码器的方法。在多个最新公开数据集上的评估表明:SegTransVAE在Dice系数和95%豪斯多夫距离指标上均超越现有方法,同时其推理速度与基于简单CNN架构的网络相当。源代码已开源在:https://github.com/itruonghai/SegTransVAE。