Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization.
翻译:基于视频的远程生理测量技术利用人脸视频测量血容量变化信号,即远程光电容积描记法(rPPG)。监督式rPPG测量方法已被证明能取得良好性能,但这类方法需要配备真实值(GT)生理信号的人脸视频,而此类数据往往获取成本高且难度大。本文提出Contrast-Phys+方法,可在无监督和弱监督两种场景下进行训练。我们采用3DCNN模型生成多个时空rPPG信号,并将rPPG先验知识融入对比损失函数中。我们进一步将真实值信号整合到对比学习框架中,以适配部分标注或标注未对齐的情况。所设计的对比损失函数促使同一视频的rPPG/GT信号相互聚合,同时将不同视频的信号推离。我们在五个公开数据集(包含RGB和近红外视频)上评估了该方法。即使仅使用部分可用、未对齐的真实值信号,甚至无需标签,Contrast-Phys+的性能仍优于当前最先进的监督式方法。此外,我们还论证了本方法在计算效率、噪声鲁棒性和泛化能力方面的优势。