Photoplethysmogram (PPG) and electrocardiogram (ECG) are commonly recorded in intesive care unit (ICU) and operating room (OR). However, the high incidence of poor, incomplete, and inconsistent signal quality, can lead to false alarms or diagnostic inaccuracies. The methods explored so far suffer from limited generalizability, reliance on extensive labeled data, and poor cross-task transferability. To overcome these challenges, we introduce QualityFM, a novel multimodal foundation model for these physiological signals, designed to acquire a general-purpose understanding of signal quality. Our model is pre-trained on an large-scale dataset comprising over 21 million 30-second waveforms and 179,757 hours of data. Our approach involves a dual-track architecture that processes paired physiological signals of differing quality, leveraging a self-distillation strategy where an encoder for high-quality signals is used to guide the training of an encoder for low-quality signals. To efficiently handle long sequential signals and capture essential local quasi-periodic patterns, we integrate a windowed sparse attention mechanism within our Transformer-based model. Furthermore, a composite loss function, which combines direct distillation loss on encoder outputs with indirect reconstruction loss based on power and phase spectra, ensures the preservation of frequency-domain characteristics of the signals. We pre-train three models with varying parameter counts (9.6 M to 319 M) and demonstrate their efficacy and practical value through transfer learning on three distinct clinical tasks: false alarm of ventricular tachycardia detection, the identification of atrial fibrillation and the estimation of arterial blood pressure (ABP) from PPG and ECG signals.
翻译:光电容积描记图(PPG)与心电图(ECG)是重症监护病房(ICU)和手术室(OR)中常规记录的生理信号。然而,信号质量差、不完整或不一致的高发生率可能导致误报警或诊断错误。现有方法普遍存在泛化能力有限、依赖大量标注数据以及跨任务迁移性差等问题。为应对这些挑战,我们提出了QualityFM——一种面向此类生理信号的新型多模态基础模型,旨在获得对信号质量的通用理解。该模型基于包含超过2100万段30秒波形、总计179,757小时数据的大规模数据集进行预训练。我们采用双通道架构处理成对的不同质量生理信号,并利用自蒸馏策略,以高质量信号编码器指导低质量信号编码器的训练。为高效处理长序列信号并捕捉关键的局部准周期模式,我们在基于Transformer的模型中集成了窗口稀疏注意力机制。此外,通过结合编码器输出的直接蒸馏损失与基于功率谱和相位谱的间接重建损失的复合损失函数,确保了信号频域特性的保留。我们预训练了三种参数量级不同的模型(9.6M至319M),并通过在三个独立临床任务上的迁移学习验证了其效能与实用价值:室性心动过速检测的误报警识别、心房颤动的检测,以及基于PPG和ECG信号的动脉血压(ABP)估计。