Multi-view cardiac magnetic resonance (CMR) imaging provides complementary anatomical information and is widely used for noninvasive disease assessment. Recent transformer-based models have demonstrated strong representation learning capabilities for CMR analysis; however, they typically learn unified latent embeddings that entangle view-specific anatomical variations with disease-related features. Such entanglement biases classifiers toward structural attributes rather than view-invariant pathological patterns. This issue is exacerbated in low-data regimes, particularly for underrepresented cardiac conditions, where limited samples increase the susceptibility to shortcut learning and view-dependent decision boundaries. To address this, we propose a Motion-Guided View--Disease Disentanglement framework MoViD built upon a ViT-MAE backbone. The model explicitly factorizes latent representations into view-specific and disease-discriminative components using dual-branch supervised contrastive objectives and a gradient-reversal adversarial constraint that minimizes disease leakage into the view embedding. Additionally, an annotation-free temporal motion feature, derived from inter-frame difference maps, is introduced to localize the beating heart region and suppress background artifacts. A focal reweighting mechanism is incorporated into the contrastive loss to mitigate class imbalance. We evaluate the framework on a private clinical venous thrombosis dataset and two public benchmarks (M&Ms, M&Ms2). Across disease classification and cardiac segmentation tasks, our approach consistently outperforms standard transformer baselines and demonstrates competitive performance against large-scale pretrained foundation models, validating the efficacy of structural disentanglement in medical image analysis.
翻译:多视图心脏磁共振(CMR)成像提供了互补的解剖信息,广泛用于无创疾病评估。近期基于Transformer的模型在CMR分析中展现出强大的表示学习能力,但它们通常学习统一潜在嵌入,将视图特定的解剖变异与疾病相关特征纠缠在一起。这种纠缠导致分类器偏向结构属性而非视图不变的病理模式。在低数据场景中,尤其对代表性不足的心脏疾病,有限样本加剧了对捷径学习和视图依赖决策边界的敏感性。为此,我们提出基于ViT-MAE骨干的MoViD(运动引导的视图-疾病解耦框架)。该模型通过双分支监督对比目标和梯度反转对抗约束,将潜在表示显式分解为视图特定和疾病判别成分,最小化疾病信息泄露到视图嵌入中。此外,引入基于帧间差异图的无注释时间运动特征,用于定位搏动心脏区域并抑制背景伪影。在对比损失中嵌入焦点重加权机制以缓解类别不平衡。我们在私有临床静脉血栓数据集和两个公开基准(M&Ms、M&Ms2)上评估该框架。在疾病分类和心脏分割任务中,我们的方法持续优于标准Transformer基线,并与大规模预训练基础模型性能相当,验证了结构解耦在医学图像分析中的有效性。