Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning. While deep learning methods, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly advanced fusion performance, some of the existing CNN-based methods fall short in capturing fine-grained multiscale and edge features, leading to suboptimal feature integration. Transformer-based models, on the other hand, are computationally intensive in both the training and fusion stages, making them impractical for real-time clinical use. Moreover, the clinical application of fused images remains unexplored. In this paper, we propose a novel CNN-based architecture that addresses these limitations by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction, coupled with a gradient operator to enhance edge detail learning. To ensure fast and efficient fusion, we present a parameter-free fusion strategy based on the weighted nuclear norm of softmax, which requires no additional computations during training or inference. Extensive experiments, including a downstream brain tumor classification task, demonstrate that our approach outperforms various baseline methods in terms of visual quality, texture preservation, and fusion speed, making it a possible practical solution for real-world clinical applications. The code will be released at https://github.com/simonZhou86/en_dran.
翻译:多模态医学图像融合是一项关键任务,它将来自不同成像模态的互补信息整合为统一表示,从而提升诊断准确性和治疗规划水平。尽管深度学习方法,特别是卷积神经网络(CNN)和Transformer,已显著提升了融合性能,但现有的一些基于CNN的方法在捕捉细粒度多尺度特征和边缘特征方面存在不足,导致特征整合效果欠佳。而基于Transformer的模型在训练和融合阶段均计算密集,难以满足实时临床应用需求。此外,融合图像在临床中的实际应用仍有待探索。本文提出一种新颖的基于CNN的架构,通过引入扩张残差注意力网络模块以实现有效的多尺度特征提取,并结合梯度算子增强边缘细节学习,从而解决上述局限性。为确保快速高效的融合,我们提出一种基于softmax加权核范数的无参数融合策略,该策略在训练和推理过程中无需额外计算。大量实验(包括下游脑肿瘤分类任务)表明,我们的方法在视觉质量、纹理保持和融合速度方面均优于多种基线方法,有望成为现实临床应用的可行解决方案。代码将在https://github.com/simonZhou86/en_dran发布。