Enhancing Mobile Face Anti-Spoofing: A Robust Framework for Diverse Attack Types under Screen Flash

Face anti-spoofing (FAS) is crucial for securing face recognition systems. However, existing FAS methods with handcrafted binary or pixel-wise labels have limitations due to diverse presentation attacks (PAs). In this paper, we propose an attack type robust face anti-spoofing framework under light flash, called ATR-FAS. Due to imaging differences caused by various attack types, traditional FAS methods based on single binary classification network may result in excessive intra-class distance of spoof faces, leading to a challenge of decision boundary learning. Therefore, we employed multiple networks to reconstruct multi-frame depth maps as auxiliary supervision, and each network experts in one type of attack. A dual gate module (DGM) consisting of a type gate and a frame-attention gate is introduced, which perform attack type recognition and multi-frame attention generation, respectively. The outputs of DGM are utilized as weight to mix the result of multiple expert networks. The multi-experts mixture enables ATR-FAS to generate spoof-differentiated depth maps, and stably detects spoof faces without being affected by different types of PAs. Moreover, we design a differential normalization procedure to convert original flash frames into differential frames. This simple but effective processing enhances the details in flash frames, aiding in the generation of depth maps. To verify the effectiveness of our framework, we collected a large-scale dataset containing 12,660 live and spoof videos with diverse PAs under dynamic flash from the smartphone screen. Extensive experiments illustrate that the proposed ATR-FAS significantly outperforms existing state-of-the-art methods. The code and dataset will be available at https://github.com/Chaochao-Lin/ATR-FAS.

翻译：人脸防欺骗（FAS）对于保障人脸识别系统的安全性至关重要。然而，现有基于手工设计二值标签或逐像素标签的人脸防欺骗方法，因呈现攻击（PAs）的多样性而存在局限。本文提出一种闪光下攻击类型鲁棒的人脸防欺骗框架，称为ATR-FAS。由于不同攻击类型造成的成像差异，传统基于单一二分类网络的防欺骗方法可能导致伪造人脸类内距离过大，引发决策边界学习的挑战。为此，我们采用多个网络重建多帧深度图作为辅助监督，每个网络专精于一种攻击类型。引入由类型门控和帧注意力门控组成的双门控模块（DGM），分别进行攻击类型识别和多帧注意力生成。DGM的输出作为权重以混合多个专家网络的结果。多专家混合机制使ATR-FAS能够生成区分伪造的深度图，并稳定检测伪造人脸而不受不同类型呈现攻击的影响。此外，我们设计了一种差分归一化流程，将原始闪光帧转换为差分帧。这一简单但有效的处理增强了闪光帧中的细节，助力深度图生成。为验证框架有效性，我们采集了一个大规模数据集，包含来自智能手机屏幕动态闪光下12,660个真实与伪造视频，涵盖多种呈现攻击。大量实验表明，所提出的ATR-FAS显著优于现有最先进方法。代码和数据集将在https://github.com/Chaochao-Lin/ATR-FAS 公开。