Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals based on facial videos, holding high potential in various applications such as healthcare, affective computing, anti-spoofing, etc. Due to the periodicity nature of rPPG, the long-range dependency capturing capacity of the Transformer was assumed to be advantageous for such signals. However, existing approaches have not conclusively demonstrated the superior performance of Transformer over traditional convolutional neural network methods, this gap may stem from a lack of thorough exploration of rPPG periodicity. In this paper, we propose RhythmFormer, a fully end-to-end transformer-based method for extracting rPPG signals by explicitly leveraging the quasi-periodic nature of rPPG. The core module, Hierarchical Temporal Periodic Transformer, hierarchically extracts periodic features from multiple temporal scales. It utilizes dynamic sparse attention based on periodicity in the temporal domain, allowing for fine-grained modeling of rPPG features. Furthermore, a fusion stem is proposed to guide self-attention to rPPG features effectively, and it can be easily transferred to existing methods to enhance their performance significantly. RhythmFormer achieves state-of-the-art performance with fewer parameters and reduced computational complexity in comprehensive experiments compared to previous approaches. The codes are available at https://github.com/zizheng-guo/RhythmFormer.
翻译:远程光电容积描记术(rPPG)是一种基于面部视频的非接触式生理信号检测方法,在医疗健康、情感计算、防欺诈等领域具有广阔应用前景。由于rPPG信号具有周期性特征,Transformer的长程依赖捕获能力被认为对此类信号具有优势。然而现有方法尚未充分证明Transformer相较于传统卷积神经网络方法的优越性能,这一差距可能源于对rPPG周期性的深入挖掘不足。本文提出RhythmFormer,一种完全端到端的基于Transformer的rPPG信号提取方法,通过显式利用rPPG的准周期特性实现信号提取。核心模块——层次时间周期Transformer,从多个时间尺度层次化提取周期特征,采用基于时间域周期性的动态稀疏注意力机制实现rPPG特征的精细建模。此外,提出融合主干网络有效引导自注意力聚焦于rPPG特征,该模块可便捷迁移至现有方法并显著提升其性能。综合实验表明,RhythmFormer在参数更少、计算复杂度更低的情况下取得最优性能。代码见https://github.com/zizheng-guo/RhythmFormer。