The generation of high-quality medical time series data is essential for advancing healthcare diagnostics and safeguarding patient privacy. Specifically, synthesizing realistic phonocardiogram (PCG) signals offers significant potential as a cost-effective and efficient tool for cardiac disease pre-screening. Despite its potential, the synthesis of PCG signals for this specific application received limited attention in research. In this study, we employ and compare three state-of-the-art generative models from different categories - WaveNet, DoppelGANger, and DiffWave - to generate high-quality PCG data. We use data from the George B. Moody PhysioNet Challenge 2022. Our methods are evaluated using various metrics widely used in the previous literature in the domain of time series data generation, such as mean absolute error and maximum mean discrepancy. Our results demonstrate that the generated PCG data closely resembles the original datasets, indicating the effectiveness of our generative models in producing realistic synthetic PCG data. In our future work, we plan to incorporate this method into a data augmentation pipeline to synthesize abnormal PCG signals with heart murmurs, in order to address the current scarcity of abnormal data. We hope to improve the robustness and accuracy of diagnostic tools in cardiology, enhancing their effectiveness in detecting heart murmurs.
翻译:高质量医疗时间序列数据的生成对于推进医疗诊断与保护患者隐私至关重要。具体而言,合成逼真的心音图信号作为一种经济高效的预筛查工具,在心脏病诊断领域展现出巨大潜力。尽管前景广阔,针对该特定应用的心音图信号合成研究仍相对有限。本研究采用并比较了来自不同类别的三种前沿生成模型——WaveNet、DoppelGANger和DiffWave——以生成高质量心音图数据。数据来源于2022年乔治·B·穆迪PhysioNet挑战赛。我们使用时序数据生成领域文献中广泛采用的多种评估指标(如平均绝对误差和最大均值差异)对方法进行验证。结果表明,生成的心音图数据与原始数据集高度相似,证实了所采用生成模型在合成逼真心音图数据方面的有效性。在后续工作中,我们计划将本方法整合至数据增强流程,合成伴有心脏杂音的异常心音图信号,以缓解当前异常数据稀缺的问题。我们期望通过此举提升心脏病诊断工具的鲁棒性与准确性,增强其检测心脏杂音的实际效能。