D-TrAttUnet: Toward Hybrid CNN-Transformer Architecture for Generic and Subtle Segmentation in Medical Images

Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses. To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.

翻译：在过去二十年中，医学影像的机器分析发展迅速，为多个重要医学应用领域开辟了巨大潜力。随着复杂疾病种类增加及病例数量上升，基于机器的影像分析作用已变得不可或缺。它既是医学专家的工具和助手，又为其提供宝贵见解与指导。该领域一项尤为艰巨的任务是病灶分割——这项任务即使对经验丰富的放射科医生来说也极具挑战性，凸显了迫切需要稳健的机器学习方法来支持医疗人员。为此，我们提出创新解决方案：D-TrAttUnet架构。该框架基于"不同疾病常靶向特定器官"这一观察设计，包含一个编码器-解码器结构，采用Transformer-CNN混合编码器及双解码器。编码器包含两条路径：Transformer路径和编码器融合模块路径。双解码器配置采用两个相同的解码器，每个均配备注意力门控，使模型能同时分割病灶与器官并整合其分割损失。为验证方法有效性，我们在新冠肺炎（Covid-19）和骨转移瘤分割任务上进行了评估，同时测试了去除第二解码器后模型在腺体和细胞核分割中的适应性。结果证实了本方法的优越性，尤其在新冠肺炎感染灶与骨转移瘤分割方面表现突出。此外，混合编码器在腺体和细胞核分割中展现出卓越性能，巩固了其在现代医学图像分析中的核心地位。