Improving the Transferability of Adversarial Attacks on Face Recognition with Beneficial Perturbation Feature Augmentation

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Face recognition (FR) models can be easily fooled by adversarial examples, which are crafted by adding imperceptible perturbations on benign face images. The existence of adversarial face examples poses a great threat to the security of society. In order to build a more sustainable digital nation, in this paper, we improve the transferability of adversarial face examples to expose more blind spots of existing FR models. Though generating hard samples has shown its effectiveness in improving the generalization of models in training tasks, the effectiveness of utilizing this idea to improve the transferability of adversarial face examples remains unexplored. To this end, based on the property of hard samples and the symmetry between training tasks and adversarial attack tasks, we propose the concept of hard models, which have similar effects as hard samples for adversarial attack tasks. Utilizing the concept of hard models, we propose a novel attack method called Beneficial Perturbation Feature Augmentation Attack (BPFA), which reduces the overfitting of adversarial examples to surrogate FR models by constantly generating new hard models to craft the adversarial examples. Specifically, in the backpropagation, BPFA records the gradients on pre-selected feature maps and uses the gradient on the input image to craft the adversarial example. In the next forward propagation, BPFA leverages the recorded gradients to add beneficial perturbations on their corresponding feature maps to increase the loss. Extensive experiments demonstrate that BPFA can significantly boost the transferability of adversarial attacks on FR.

翻译：人脸识别（FR）模型极易被对抗样本所欺骗，这些样本通过在良性人脸图像上添加难以察觉的扰动生成。对抗性人脸样本的存在对社会安全构成了巨大威胁。为构建更可持续的数字国家，本文旨在通过提高对抗性人脸样本的可迁移性，揭示现有FR模型更多的盲点。尽管生成困难样本在训练任务中已被证明能有效提升模型的泛化能力，但利用这一思想来提高对抗性人脸样本可迁移性的有效性尚未得到探索。为此，基于困难样本的特性以及训练任务与对抗攻击任务之间的对称性，我们提出了困难模型的概念，该概念在对抗攻击任务中具有与困难样本相似的效果。利用困难模型的概念，我们提出了一种新颖的攻击方法，称为有益扰动特征增强攻击（BPFA），该方法通过不断生成新的困难模型来构造对抗样本，从而减少对抗样本对代理FR模型的过拟合。具体而言，在反向传播中，BPFA记录预选特征图上的梯度，并利用输入图像上的梯度来构造对抗样本。在下一次前向传播中，BPFA利用记录的梯度在其对应特征图上添加有益扰动以增加损失。大量实验证明，BPFA能显著提升人脸识别中对抗攻击的可迁移性。