Motion-Guided Causal Disentanglement for Robust Multi-View Cine Cardiac MRI Diagnosis

Chuankai Xu,Cristiane De Carvalho Singulane,Mohammad Abuannadi,Stephen Chandler,Jeremy Slivnick,Karolina Zareba,Jane Cao,Vidya Nadig,Fabio Fernandes,Seth Uretsky,Diego Perez de Arenaza,Amit Patel,Jianxin Xie

Multi-view cardiac magnetic resonance (CMR) imaging provides complementary anatomical information and is widely used for noninvasive disease assessment. Recent transformer-based models have demonstrated strong representation learning capabilities for CMR analysis; however, they typically learn unified latent embeddings that entangle view-specific anatomical variations with disease-related features. Such entanglement biases classifiers toward structural attributes rather than view-invariant pathological patterns. This issue is exacerbated in low-data regimes, particularly for underrepresented cardiac conditions, where limited samples increase the susceptibility to shortcut learning and view-dependent decision boundaries. To address this, we propose a Motion-Guided View--Disease Disentanglement framework MoViD built upon a ViT-MAE backbone. The model explicitly factorizes latent representations into view-specific and disease-discriminative components using dual-branch supervised contrastive objectives and a gradient-reversal adversarial constraint that minimizes disease leakage into the view embedding. Additionally, an annotation-free temporal motion feature, derived from inter-frame difference maps, is introduced to localize the beating heart region and suppress background artifacts. A focal reweighting mechanism is incorporated into the contrastive loss to mitigate class imbalance. We evaluate the framework on a private clinical venous thrombosis dataset and two public benchmarks (M&Ms, M&Ms2). Across disease classification and cardiac segmentation tasks, our approach consistently outperforms standard transformer baselines and demonstrates competitive performance against large-scale pretrained foundation models, validating the efficacy of structural disentanglement in medical image analysis.

翻译：多视图心脏磁共振（CMR）成像提供了互补的解剖信息，广泛用于无创疾病评估。近期基于Transformer的模型在CMR分析中展现出强大的表示学习能力，但它们通常学习统一潜在嵌入，将视图特定的解剖变异与疾病相关特征纠缠在一起。这种纠缠导致分类器偏向结构属性而非视图不变的病理模式。在低数据场景中，尤其对代表性不足的心脏疾病，有限样本加剧了对捷径学习和视图依赖决策边界的敏感性。为此，我们提出基于ViT-MAE骨干的MoViD（运动引导的视图-疾病解耦框架）。该模型通过双分支监督对比目标和梯度反转对抗约束，将潜在表示显式分解为视图特定和疾病判别成分，最小化疾病信息泄露到视图嵌入中。此外，引入基于帧间差异图的无注释时间运动特征，用于定位搏动心脏区域并抑制背景伪影。在对比损失中嵌入焦点重加权机制以缓解类别不平衡。我们在私有临床静脉血栓数据集和两个公开基准（M&Ms、M&Ms2）上评估该框架。在疾病分类和心脏分割任务中，我们的方法持续优于标准Transformer基线，并与大规模预训练基础模型性能相当，验证了结构解耦在医学图像分析中的有效性。