Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate. Even worse, existing techniques require attack data before implementing the defense, making it impractical to defend newly emerging attacks that are unseen to defenders. In this paper, we investigate the intrinsic generality of adv-faces and propose to generate pseudo adv-faces by perturbing real faces with three heuristically designed noise patterns. We are the first to train an adv-face detector using only real faces and their self-perturbations, agnostic to victim facial recognition systems, and agnostic to unseen attacks. By regarding adv-faces as out-of-distribution data, we then naturally introduce a novel cascaded system for adv-face detection, which consists of training data self-perturbations, decision boundary regularization, and a max-pooling-based binary classifier focusing on abnormal local color aberrations. Experiments conducted on LFW and CelebA-HQ datasets with eight gradient-based and two GAN-based attacks validate that our method generalizes to a variety of unseen adversarial attacks.
翻译:对抗攻击旨在通过向输入样本添加特定噪声来干扰目标系统的功能,当应用于人脸识别系统时,会对安全性和鲁棒性构成潜在威胁。尽管现有防御技术在检测某些特定对抗人脸方面取得了较高准确率,但新的攻击方法,尤其是具有完全不同的噪声模式的基于GAN的攻击,能够绕过这些技术并达到更高的攻击成功率。更糟糕的是,现有技术需要在实施防御前获取攻击数据,这使得针对防御者未知的新兴攻击进行防御变得不切实际。本文研究了对抗人脸的内在通用性,并提出通过使用三种启发式设计的噪声模式对真实人脸进行扰动来生成伪对抗人脸。我们首次仅利用真实人脸及其自扰动训练对抗人脸检测器,该检测器既不了解受害者人脸识别系统,也不了解未知攻击。通过将对抗人脸视为分布外数据,我们自然地引入了一种用于对抗人脸检测的新型级联系统,该系统包括训练数据自扰动、决策边界正则化以及一个专注于异常局部色差的基于最大池化的二分类器。在LFW和CelebA-HQ数据集上使用八种基于梯度的方法和两种基于GAN的攻击进行的实验验证了我们的方法能够泛化到多种未知的对抗攻击。