Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. These challenges arise from (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion, and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality. To address modality unreliability, we propose the Uncertainty-Guided Cross-Adapter (U-Adapter) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities. For modality imbalance, we propose a Rebalanced Modality Gradient Modulation (ReGrad) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients. Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on https://github.com/OMGGGGG/mmdg.
翻译:人脸防伪(FAS)对于保护人脸识别系统免受演示攻击至关重要。随着传感器制造和多模态学习技术的进步,许多多模态FAS方法相继涌现。然而,这些方法在泛化至未见攻击和部署场景时面临挑战。这些挑战源于:(1)模态不可靠性,即深度和红外等模态传感器在不同环境下发生显著域偏移,导致跨模态特征融合过程中不可靠信息扩散;(2)模态不平衡,即训练过度依赖主导模态会阻碍其他模态收敛,降低对仅凭主导模态难以区分的攻击类型的有效性。为解决模态不可靠性,我们提出不确定性引导的跨模态适配器(U-Adapter),用于识别各模态内不可靠检测区域,并抑制不可靠区域对其他模态的影响。针对模态不平衡,我们提出再平衡模态梯度调制(ReGrad)策略,通过自适应调整所有模态的梯度来重新平衡其收敛速度。此外,我们构建了首个用于评估域泛化场景下多模态FAS性能的大规模基准。大量实验表明,我们的方法优于现有最优方法。源代码与协议将在https://github.com/OMGGGGG/mmdg上发布。