Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography

There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in real-world scenarios. To address these challenges, we propose a novel masked attention regularization (MAR-rPPG) framework that mitigates the impact of ROI localization and complex motion artifacts. Specifically, our approach first integrates a masked attention regularization mechanism into the rPPG field to capture the visual semantic consistency of facial clips, while it also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance. Furthermore, we propose an enhanced rPPG expert aggregation (EREA) network as the backbone to obtain rPPG signals and attention maps simultaneously. Our EREA network is capable of discriminating divergent attentions from different facial areas and retaining the consistency of spatiotemporal attention maps. For motion robustness, a simple open source detector MediaPipe for data preprocessing is sufficient for our framework due to its superior capability of rPPG signal extraction and attention regularization. Exhaustive experiments on three benchmark datasets (UBFC-rPPG, PURE, and MMPD) substantiate the superiority of our proposed method, outperforming recent state-of-the-art works by a considerable margin.

翻译：近年来，基于面部视频的远程光电容积描记（rPPG）测量日益受到关注，其重点在于评估心率、心率变异性等多种生命体征。尽管先前研究在静态数据集上付出了诸多努力，但这些方法仍受限于感兴趣区域（ROI）定位不准与运动干扰问题，且在真实场景中泛化能力有限。为应对这些挑战，本文提出一种新颖的掩码注意力正则化（MAR-rPPG）框架，以减轻ROI定位与复杂运动伪影的影响。具体而言，本方法首先将掩码注意力正则化机制引入rPPG领域，以捕捉面部片段的视觉语义一致性；同时采用掩码技术防止模型对不准确的ROI产生过拟合，从而避免性能下降。此外，我们提出一种增强型rPPG专家聚合（EREA）网络作为主干架构，可同步获取rPPG信号与注意力图。该EREA网络能够区分来自不同面部区域的注意力差异，并保持时空注意力图的一致性。在运动鲁棒性方面，得益于卓越的rPPG信号提取与注意力正则化能力，本框架仅需采用开源检测器MediaPipe进行数据预处理即可满足需求。在三个基准数据集（UBFC-rPPG、PURE和MMPD）上的大量实验证实了所提方法的优越性，其性能显著超越当前最先进的研究成果。