Video-based heart and respiratory rate measurements using facial videos are more useful and user-friendly than traditional contact-based sensors. However, most of the current deep learning approaches require ground-truth pulse and respiratory waves for model training, which are expensive to collect. In this paper, we propose CalibrationPhys, a self-supervised video-based heart and respiratory rate measurement method that calibrates between multiple cameras. CalibrationPhys trains deep learning models without supervised labels by using facial videos captured simultaneously by multiple cameras. Contrastive learning is performed so that the pulse and respiratory waves predicted from the synchronized videos using multiple cameras are positive and those from different videos are negative. CalibrationPhys also improves the robustness of the models by means of a data augmentation technique and successfully leverages a pre-trained model for a particular camera. Experimental results utilizing two datasets demonstrate that CalibrationPhys outperforms state-of-the-art heart and respiratory rate measurement methods. Since we optimize camera-specific models using only videos from multiple cameras, our approach makes it easy to use arbitrary cameras for heart and respiratory rate measurements.
翻译:利用面部视频进行心率和呼吸率测量相较于传统接触式传感器更为便捷且用户友好。然而,当前大多数深度学习方法需要真实脉搏波和呼吸波作为模型训练标签,数据采集成本高昂。本文提出CalibrationPhys——一种通过多相机间校准实现自监督视频心率与呼吸率测量的方法。该方法利用多台相机同步采集的面部视频训练深度学习模型,无需监督标签。通过对比学习,使多台相机同步视频预测的脉搏波与呼吸波作为正样本,而异步视频预测结果作为负样本。CalibrationPhys还通过数据增强技术提升模型鲁棒性,并成功利用针对特定相机的预训练模型。基于两个数据集的实验结果表明,CalibrationPhys在心率与呼吸率测量上优于现有最优方法。由于仅需多相机视频即可优化相机专用模型,我们的方法可便捷地应用于任意相机的生理信号测量。