Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically challenging and resource intensive, which also raises privacy concerns when storing personal bio-metric data. Synthetic video-based datasets (e.g., SCAMPS~\cite{mcduff2022scamps}) with photo-realistic synthesized avatars are introduced to alleviate the issues while providing high-quality synthetic data. However, there exists a significant gap between synthetic and real-world data, which hinders the generalization of neural models trained on these synthetic datasets. In this paper, we proposed several measures to add real-world noise to synthetic physiological signals and corresponding facial videos. We experimented with individual and combined augmentation methods and evaluated our framework on three public real-world datasets. Our results show that we were able to reduce the average MAE from 6.9 to 2.0.
翻译:近期监督式深度学习技术的进展表明,仅从面部视频即可远程测量人体生理生命体征(如光电容积描记图、心率)。然而,这些方法的性能在很大程度上依赖于真实标注数据的可用性和多样性。但大规模采集带有高质量标注的真实世界数据通常既困难又耗费资源,且在存储个人生物特征数据时还会引发隐私问题。为缓解上述问题,基于合成视频的数据集(如SCAMPS~\cite{mcduff2022scamps})通过照片级逼真的合成头像提供了高质量合成数据。然而,合成数据与真实数据之间存在显著差距,这阻碍了基于这些合成数据集训练的神经模型的泛化能力。本文提出了多种向合成生理信号及其对应面部视频添加真实世界噪声的方法。我们实验了个体与组合增强方法,并在三个公开真实数据集上评估了框架性能。结果显示,我们能够将平均绝对误差(MAE)从6.9降低至2.0。