Multimodal fusion frameworks, which integrate diverse medical imaging modalities (e.g., MRI, CT), have shown great potential in applications such as skin cancer detection, dementia diagnosis, and brain tumor prediction. However, existing multimodal fusion methods face significant challenges. First, they often rely on computationally expensive models, limiting their applicability in low-resource environments. Second, they often employ cascaded attention modules, which potentially increase risk of information loss during inter-module transitions and hinder their capacity to effectively capture robust shared representations across modalities. This restricts their generalization in multi-disease analysis tasks. To address these limitations, we propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net), composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing refined modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities. Extensive experiments on ten publicly available datasets exhibit that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost. Code: https://github.com/misti1203/HyPCA-Net.
翻译:多模态融合框架通过整合不同的医学成像模态(如MRI、CT),在皮肤癌检测、痴呆症诊断和脑肿瘤预测等应用中展现出巨大潜力。然而,现有的多模态融合方法面临重大挑战。首先,它们通常依赖于计算成本高昂的模型,限制了其在资源受限环境下的适用性。其次,它们常采用级联注意力模块,这可能增加模块间转换过程中的信息丢失风险,并阻碍其有效捕获跨模态的鲁棒共享表示能力,从而限制了其在多疾病分析任务中的泛化性能。为解决这些局限性,我们提出了一种混合并行融合级联注意力网络(HyPCA-Net),该网络由两个核心创新模块构成:(a)一个用于捕获精细化模态特定表示的高计算效率残差自适应学习注意力模块;(b)一个旨在学习跨不同模态的鲁棒共享表示的双视角级联注意力模块。在十个公开可用数据集上进行的大量实验表明,HyPCA-Net显著优于现有的领先方法,性能提升高达5.2%,同时计算成本降低高达73.1%。代码:https://github.com/misti1203/HyPCA-Net。