Medical multimodal representation learning aims to integrate heterogeneous data into unified patient representations to support clinical outcome prediction. However, real-world medical datasets commonly contain systematic biases from multiple sources, which poses significant challenges for medical multimodal representation learning. Existing approaches typically focus on effective multimodal fusion, neglecting inherent biased features that affect the generalization ability. To address these challenges, we propose a Dual-Stream Feature Decorrelation Framework that identifies and handles the biases through structural causal analysis introduced by latent confounders. Our method employs a causal-biased decorrelation framework with dual-stream neural networks to disentangle causal features from spurious correlations, utilizing generalized cross-entropy loss and mutual information minimization for effective decorrelation. The framework is model-agnostic and can be integrated into existing medical multimodal learning methods. Comprehensive experiments on MIMIC-IV, eICU, and ADNI datasets demonstrate consistent performance improvements.
翻译:医疗多模态表征学习旨在将异质数据整合为统一的患者表征以支持临床结果预测。然而,真实世界医疗数据集通常包含来自多源的系统性偏差,这对医疗多模态表征学习构成了重大挑战。现有方法通常侧重于有效的多模态融合,而忽略了影响泛化能力的内在偏差特征。为解决这些挑战,我们提出一种双流特征解相关框架,该框架通过潜在混杂因子引入的结构因果分析来识别和处理偏差。我们的方法采用具有双流神经网络的因果偏差解相关框架,通过广义交叉熵损失和互信息最小化实现有效解相关,从而将因果特征与伪相关性分离。该框架与模型无关,可集成到现有的医疗多模态学习方法中。在MIMIC-IV、eICU和ADNI数据集上的综合实验证明了其持续的性能提升。