Face presentation attacks, also known as spoofing attacks, pose a significant threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems. To prevent spoofing, several video-based methods have been presented in the literature that analyze facial motion in successive video frames. However, estimating the motion between adjacent frames is a challenging task and requires high computational cost. In this paper, we reformulate the face anti-spoofing task as a motion prediction problem and introduce a deep ensemble learning model with a frame skipping mechanism. The proposed frame skipping is based on a uniform sampling approach where the original video is divided into fixed size video clips. In this way, every nth frame of the clip is selected to ensure that the temporal patterns can easily be perceived during the training of three different recurrent neural networks (RNNs). Motivated by the performance of each RNNs, a meta-model is developed to improve the overall recognition performance by combining the predictions of the individual RNNs. Extensive experiments were conducted on four datasets, and state-of-the-art performance is reported for MSU-MFSD (3.12\%), Replay-Attack (11.19\%), and OULU-NPU (12.23\%) using half total error rate (HTER) in the most challenging cross-dataset test scenario.
翻译:人脸呈现攻击(即欺骗攻击)对依赖人脸识别系统的生物特征识别系统(如门禁控制系统、移动支付及身份验证系统)构成重大威胁。为防止欺骗,现有文献提出了多种基于视频的方法,通过分析连续视频帧中的人脸运动来进行防御。然而,估计相邻帧之间的运动是一项具有挑战性的任务,且计算成本高昂。本文重新将人脸反欺骗任务定义为运动预测问题,并引入一种结合跳帧机制的深度集成学习模型。所提出的跳帧机制基于均匀采样方法,将原始视频划分为固定长度的视频片段。通过选择片段中每隔n帧的采样帧,可确保在训练三个不同的循环神经网络(RNN)时,时序模式能够被有效感知。基于各RNN的性能差异,我们进一步构建元模型,通过融合单个RNN的预测结果来提升整体识别性能。在四个数据集上进行的大量实验表明,在最具挑战性的跨数据集测试场景下,本方法在MSU-MFSD(3.12%)、Replay-Attack(11.19%)和OULU-NPU(12.23%)数据集上均实现了基于半数总错误率(HTER)的最新性能。