Photoplethysmography (PPG)-based blood pressure (BP) estimation represents a promising alternative to cuff-based BP measurements. Recently, an increasing number of deep learning models have been proposed to infer BP from the raw PPG waveform. However, these models have been predominantly evaluated on in-distribution test sets, which immediately raises the question of the generalizability of these models to external datasets. To investigate this question, we trained five deep learning models on the recently released PulseDB dataset, provided in-distribution benchmarking results on this dataset, and then assessed out-of-distribution performance on several external datasets. The best model (XResNet1d101) achieved in-distribution MAEs of 9.4 and 6.0 mmHg for systolic and diastolic BP respectively on PulseDB (with subject-specific calibration), and 14.0 and 8.5 mmHg respectively without calibration. Equivalent MAEs on external test datasets without calibration ranged from 15.0 to 25.1 mmHg (SBP) and 7.0 to 10.4 mmHg (DBP). Our results indicate that the performance is strongly influenced by the differences in BP distributions between datasets. We investigated a simple way of improving performance through sample-based domain adaptation and put forward recommendations for training models with good generalization properties. With this work, we hope to educate more researchers for the importance and challenges of out-of-distribution generalization.
翻译:基于光电容积脉搏波描记法(PPG)的血压(BP)估计代表了替代袖带式血压测量的有前景方案。近年来,越来越多深度学习模型被提出用于从原始PPG波形推断血压。然而,这些模型主要在分布内测试集上进行评估,这立即引发了这些模型在外部数据集上泛化能力的问题。为探究此问题,我们在最新发布的PulseDB数据集上训练了五种深度学习模型,提供了该数据集的分布内基准测试结果,随后在多个外部数据集上评估了分布外性能。最佳模型(XResNet1d101)在PulseDB上(采用受试者特异性校准时)实现了收缩压和舒张压的分布内平均绝对误差分别为9.4和6.0 mmHg,无校准时分别为14.0和8.5 mmHg。在未校准的外部测试数据集上,等效平均绝对误差范围分别为收缩压15.0至25.1 mmHg、舒张压7.0至10.4 mmHg。我们的结果表明,性能受数据集间血压分布差异的强烈影响。我们研究了一种通过基于样本的域适应来提升性能的简单方法,并提出了训练具有良好泛化特性模型的建议。通过本工作,我们希望使更多研究者认识到分布外泛化的重要性和挑战。