Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
翻译:合成数据生成是解决敏感健康数据分布隐私问题的一种有前景的方案。近年来,扩散模型为不同数据模态的生成模型确立了新标准。与此同时,结构化状态空间模型作为一种强大的建模范式,在捕捉时间序列长期依赖关系方面崭露头角。我们提出SSSD-ECG,作为这两项技术的结合,用于生成基于70余种心电图陈述条件的合成12导联心电图。由于缺乏可靠的基线方法,我们还提出了两种现有最先进无条件生成模型的条件变体。我们通过评估预训练分类器在生成数据上的表现,以及仅依靠合成数据训练的分类器性能,全面评估了生成样本的质量,结果显示SSSD-ECG明显优于基于生成对抗网络的竞争模型。我们通过进一步实验证明了该方法的可靠性,包括条件类别插值以及一项临床图灵测试,该测试表明SSSD-ECG样本在广泛条件下均具有高质量。