Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.
翻译:临床医生通过综合患者症状、病史以及MRI和CT扫描等多模态定量影像数据形成统一临床判断,从而诊断脑肿瘤。然而,现有深度学习模型大多仅依赖MRI/CT图像,未能复现临床医生的多模态推理过程。本文探索了一种双分支多模态网络,将原始MRI扫描与91种提取的影像组学特征(包括强度、纹理、形状及边界描述符)相结合,对脑肿瘤进行胶质瘤、脑膜瘤、垂体瘤和无肿瘤四分类。图像分支采用预训练CNN骨干网络编码,而影像组学分支则通过专用MLP编码。两分支通过拼接融合、门控融合或双向跨模态注意力策略进行特征融合。在包含7200张图像的平衡数据集上的九组实验表明,所有多模态配置均优于单模态基线方法,其中门控融合策略以96.13%的准确率取得最佳性能。