Echocardiography (ECHO) is essential for cardiac assessments, but its video quality and interpretation heavily relies on manual expertise, leading to inconsistent results from clinical and portable devices. ECHO video generation offers a solution by improving automated monitoring through synthetic data and generating high-quality videos from routine health data. However, existing models often face high computational costs, slow inference, and rely on complex conditional prompts that require experts' annotations. To address these challenges, we propose ECHOPULSE, an ECG-conditioned ECHO video generation model. ECHOPULSE introduces two key advancements: (1) it accelerates ECHO video generation by leveraging VQ-VAE tokenization and masked visual token modeling for fast decoding, and (2) it conditions on readily accessible ECG signals, which are highly coherent with ECHO videos, bypassing complex conditional prompts. To the best of our knowledge, this is the first work to use time-series prompts like ECG signals for ECHO video generation. ECHOPULSE not only enables controllable synthetic ECHO data generation but also provides updated cardiac function information for disease monitoring and prediction beyond ECG alone. Evaluations on three public and private datasets demonstrate state-of-the-art performance in ECHO video generation across both qualitative and quantitative measures. Additionally, ECHOPULSE can be easily generalized to other modality generation tasks, such as cardiac MRI, fMRI, and 3D CT generation. Demo can seen from \url{https://github.com/levyisthebest/ECHOPulse_Prelease}.
翻译:心脏超声检查对于心脏评估至关重要,但其视频质量和解读高度依赖人工专业经验,导致临床与便携设备的结果存在不一致性。心脏超声视频生成通过合成数据改进自动化监测,以及从常规健康数据生成高质量视频,为此提供了解决方案。然而,现有模型通常面临计算成本高、推理速度慢,且依赖需要专家标注的复杂条件提示等问题。为应对这些挑战,我们提出了ECHOPULSE,一种基于心电图条件控制的心脏超声视频生成模型。ECHOPULSE引入了两项关键改进:(1)通过利用VQ-VAE标记化和掩码视觉标记建模实现快速解码,从而加速心脏超声视频生成;(2)以易于获取且与心脏超声视频高度一致的心电图信号作为条件,绕过了复杂的条件提示。据我们所知,这是首个使用心电图等时间序列提示进行心脏超声视频生成的研究。ECHOPULSE不仅实现了可控的合成心脏超声数据生成,还能为疾病监测与预测提供超越单一心电图的心脏功能更新信息。在三个公开及私有数据集上的评估表明,该模型在定性与定量指标上均达到了心脏超声视频生成的最先进性能。此外,ECHOPULSE可轻松推广至其他模态生成任务,如心脏磁共振成像、功能磁共振成像及三维CT生成。演示可见于\url{https://github.com/levyisthebest/ECHOPulse_Prelease}。