Medical image segmentation is a fundamental task in the community of medical image analysis. In this paper, a novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), is proposed. CTO employs a combination of Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and an explicit boundary detection operator to achieve high recognition accuracy while maintaining an optimal balance between accuracy and efficiency. The proposed CTO follows the standard encoder-decoder segmentation paradigm, where the encoder network incorporates a popular CNN backbone for capturing local semantic information, and a lightweight ViT assistant for integrating long-range dependencies. To enhance the learning capacity on boundary, a boundary-guided decoder network is proposed that uses a boundary mask obtained from a dedicated boundary detection operator as explicit supervision to guide the decoding learning process. The performance of the proposed method is evaluated on six challenging medical image segmentation datasets, demonstrating that CTO achieves state-of-the-art accuracy with a competitive model complexity.
翻译:医学图像分割是医学图像分析领域的一项基础任务。本文提出了一种新颖的网络架构——卷积、Transformer与算子(CTO)。该架构结合了卷积神经网络(CNN)、视觉Transformer(ViT)以及一种显式边界检测算子,在保持精度与效率最优平衡的同时实现高识别精度。所提出的CTO采用标准的编码器-解码器分割范式,其中编码器网络集成了用于捕获局部语义信息的流行CNN骨干网络,以及用于整合长程依赖关系的轻量级ViT辅助模块。为增强对边界的学习能力,提出了一种边界引导解码器网络,该网络利用专用边界检测算子获得的边界掩码作为显式监督信号,引导解码学习过程。在六个具有挑战性的医学图像分割数据集上对提出方法的性能进行了评估,结果表明CTO以具有竞争力的模型复杂度实现了最先进的精度。