Foundation models for echocardiography often struggle to disentangle anatomical signal from the stochastic speckle and acquisition artifacts inherent to ultrasound. We present EchoJEPA, a foundation model trained on 18 million echocardiograms across 300K patients, representing the largest pretraining corpus for this modality to date. By leveraging a latent predictive objective, EchoJEPA learns robust anatomical representations that ignore speckle noise. We validate this using a novel multi-view probing framework with frozen backbones, where EchoJEPA outperforms leading baselines by approximately 20% in left ventricular ejection fraction (LVEF) estimation and 17% in right ventricular systolic pressure (RVSP) estimation. The model also exhibits remarkable sample efficiency, reaching 79% view classification accuracy with only 1% of labeled data versus 42% for the best baseline trained on 100%. Crucially, EchoJEPA demonstrates superior generalization, degrading by only 2% under physics-informed acoustic perturbations compared to 17% for competitors. Most remarkably, its zero-shot performance on pediatric patients surpasses fully fine-tuned baselines, establishing latent prediction as a superior paradigm for robust, generalizable medical AI.
翻译:超声心动图的基础模型通常难以将解剖学信号与超声固有的随机斑点噪声和采集伪影分离。本文提出EchoJEPA——一个基于30万患者的1800万份超声心动图训练的基础模型,这是该模态迄今最大的预训练数据集。通过采用潜在预测目标,EchoJEPA能够学习忽略斑点噪声的稳健解剖学表征。我们使用新型多视图冻结骨干网络探测框架进行验证,该模型在左心室射血分数(LVEF)估计任务中领先主流基线约20%,在右心室收缩压(RVSP)估计中领先17%。该模型还表现出卓越的样本效率:仅使用1%标注数据即可达到79%的视图分类准确率,而最佳基线使用100%数据仅达到42%。关键的是,EchoJEPA展现出优异的泛化能力:在基于物理原理的声学扰动下性能仅下降2%,而对比模型下降达17%。最显著的是,其在儿科患者上的零样本性能超越了完全微调的基线模型,这确立了潜在预测作为构建稳健、可泛化医疗人工智能的优越范式。