Face presentation attacks (PA), also known as spoofing attacks, pose a substantial threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems. To mitigate the spoofing risk, several video-based methods have been presented in the literature that analyze facial motion in successive video frames. However, estimating the motion between adjacent frames is a challenging task and requires high computational cost. In this paper, we rephrase the face anti-spoofing task as a motion prediction problem and introduce a deep ensemble learning model with a frame skipping mechanism. In particular, the proposed frame skipping adopts a uniform sampling approach by dividing the original video into video clips of fixed size. By doing so, every nth frame of the clip is selected to ensure that the temporal patterns can easily be perceived during the training of three different recurrent neural networks (RNNs). Motivated by the performance of individual RNNs, a meta-model is developed to improve the overall detection performance by combining the prediction of individual RNNs. Extensive experiments were performed on four datasets, and state-of-the-art performance is reported on MSU-MFSD (3.12%), Replay-Attack (11.19%), and OULU-NPU (12.23%) databases by using half total error rates (HTERs) in the most challenging cross-dataset testing scenario.
翻译:人脸呈现攻击(PA),也称为欺骗攻击,对依赖人脸识别系统的生物识别系统(如门禁系统、移动支付和身份验证系统)构成重大威胁。为降低欺骗风险,已有多种基于视频的方法被提出,通过分析连续视频帧中的人脸运动来检测攻击。然而,相邻帧之间的运动估计是一项具有挑战性的任务,且计算成本高昂。本文将人脸防欺骗任务重新表述为运动预测问题,并引入一种带有帧跳过机制的深度集成学习模型。具体而言,所提出的帧跳过采用均匀采样方法,将原始视频划分为固定大小的视频片段。通过这种方式,选择片段中每第n帧,确保在三个不同递归神经网络(RNN)训练过程中更容易感知时间模式。受单个RNN性能的启发,我们开发了一个元模型,通过结合单个RNN的预测结果来提高整体检测性能。在四个数据集上进行了大量实验,并在最具挑战性的跨数据集测试场景下,使用半数总错误率(HTER)在MSU-MFSD(3.12%)、Replay-Attack(11.19%)和OULU-NPU(12.23%)数据库上报告了最先进的性能。