Semantic Compensation via Adversarial Removal for Robust Zero-Shot ECG Diagnosis

Recent ECG--language pretraining methods enable zero-shot diagnosis by aligning cardiac signals with clinical text, but they do not explicitly model robustness to partial observation and are typically studied under fully observed ECG settings. In practice, diagnostically critical leads or temporal segments may be missing due to electrode detachment, motion artifacts, or signal corruption, causing severe degradation of cross-modal semantic alignment. In this paper, we propose \textbf{SCAR}, a robust ECG--language pretraining framework for \textbf{S}emantic \textbf{C}ompensation via \textbf{A}dversarial \textbf{R}emoval. SCAR improves robustness by explicitly training the model to remain semantically aligned with semantically critical missingness and to recover diagnostic meaning from the remaining visible evidence. Specifically, we introduce a differentiable adversarial masker to remove the most alignment-critical spatio-temporal ECG tokens during training, forcing the ECG encoder to learn representations that remain semantically aligned with clinical text even when primary diagnostic evidence is missing. Under such adversarial corruption, we equip the ECG encoder with a semantically supervised adaptive selector that learns to reweight the remaining visible tokens and compensate with secondary yet diagnostically informative morphological cues. To evaluate robustness beyond classification accuracy, we further introduce Counterfactual Missingness Resolution Score (CMRS), which quantifies how well feature preserve diagnostic semantics under missingness. Experiments on $6$ datasets show that SCAR consistently improves semantic robustness under joint lead and temporal missingness, with particularly clear advantages in harder cases where primary diagnostic evidence is unavailable, while also yielding stronger linear-probing transferability.

翻译：近期的心电图—语言预训练方法通过对齐心脏信号与临床文本实现了零样本诊断，但这些方法并未显式建模对部分观测的鲁棒性，且通常在完全观测的心电图场景下进行研究。实际应用中，由于电极脱落、运动伪迹或信号污染，诊断关键导联或时间片段可能缺失，导致跨模态语义对齐严重退化。本文提出**SCAR**——一种通过**对**抗性**去**除实现**语义**补偿的鲁棒心电图—语言预训练框架。SCAR通过显式训练模型在语义关键缺失情况下保持语义对齐，并从剩余可见证据中恢复诊断意义，从而提升鲁棒性。具体而言，我们引入可微的对抗性掩蔽器，在训练过程中移除与对齐最关键的时空心电图标记，迫使心电图编码器学习到即使主要诊断证据缺失仍能与临床文本保持语义对齐的表征。在此类对抗性扰动下，我们为心电图编码器配备语义监督的自适应选择器，学习对剩余可见标记重新加权，并利用次要但包含诊断信息形态学线索进行补偿。为在分类准确率之外评估鲁棒性，我们进一步引入反事实缺失解析分数（CMRS），该指标量化特征在缺失情况下保持诊断语义的程度。在6个数据集上的实验表明，SCAR在联合导联与时间缺失条件下持续提升语义鲁棒性，尤其在主要诊断证据不可用的困难案例中展现出显著优势，同时产生更强的线性探测迁移能力。