ECG-Image-Kit: A Synthetic Image Generation Toolbox to Facilitate Deep Learning-Based Electrocardiogram Digitization

We introduce ECG-Image-Kit, an open-source toolbox for generating synthetic ECG images with realistic artifacts from time-series data, and showcase its application in developing algorithms for data augmentation and ECG image digitization. Synthetic data is generated by producing distortionless ECG images on a standard ECG paper background. Subsequently, various distortions, including handwritten text artifacts, wrinkles, creases, and perspective transformations, are applied to these ECG images. The artifacts and text are synthetically generated, excluding personally identifiable information. The toolbox is used for data augmentation in the 2024 PhysioNet Challenge on Digitization and Classification of ECG Images. As a case study, we employed ECG-Image-Kit to create an ECG image dataset of 21,801 records from the PhysioNet QT database. A denoising convolutional neural network (DnCNN)-based model was developed and trained on this synthetic dataset and used to convert the synthetically generated images back into time-series data for evaluation. SNR was calculated to assess the quality of image digitization compared to the ground truth ECG time-series. The results show an average signal recovery SNR of 11.17 +/- 9.19 dB, indicating the synthetic ECG image dataset's significance for training deep learning models. For clinical evaluation, we measured the error between the estimated and ground-truth time-series data's RR and QT-intervals. The accuracy of the estimated RR and QT-intervals also suggests that the respective clinical parameters are maintained. These results demonstrate the effectiveness of a deep learning-based pipeline in accurately digitizing paper ECGs and highlight a generative approach to digitization.

翻译：本文介绍ECG-Image-Kit这一开源工具箱，该工具可从时间序列数据生成具有真实伪影的合成心电图图像，并展示其在数据增强与心电图图像数字化算法开发中的应用。合成数据通过在标准心电图记录纸背景上生成无失真图像来实现，随后对图像施加包括手写文本伪影、褶皱、折痕及透视变换在内的多种失真形式。所有伪影与文本均为合成生成，不包含个人身份信息。基于该工具箱，我们为2024年PhysioNet挑战赛（心电图图像数字化与分类）提供了数据增强支持。作为案例研究，我们利用ECG-Image-Kit从PhysioNet QT数据库创建了包含21,801条记录的心电图图像数据集，并在此合成数据集上训练了基于去噪卷积神经网络（DnCNN）的模型，用于将生成的合成图像转换回时间序列数据进行评估。通过计算信噪比（SNR）衡量图像数字化质量与真实心电图时间序列的差异，结果显示平均信号恢复SNR为11.17±9.19 dB，表明该合成心电图数据集对训练深度学习模型具有重要价值。临床评估中，我们测量了估计时间序列数据与真实数据中RR间期和QT间期的误差，结果显示RR与QT间期估测的准确性表明相应临床参数得以保留。这些结果验证了基于深度学习的流程在准确数字化纸质心电图方面的有效性，并突显了数字化的生成式方法。