Improving the Transferability of Adversarial Attacks on Face Recognition with Beneficial Perturbation Feature Augmentation

from arxiv, \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Face recognition (FR) models can be easily fooled by adversarial examples, which are crafted by adding imperceptible perturbations on benign face images. The existence of adversarial face examples poses a great threat to the security of society. In order to build a more sustainable digital nation, in this paper, we improve the transferability of adversarial face examples to expose more blind spots of existing FR models. Though generating hard samples has shown its effectiveness in improving the generalization of models in training tasks, the effectiveness of utilizing this idea to improve the transferability of adversarial face examples remains unexplored. To this end, based on the property of hard samples and the symmetry between training tasks and adversarial attack tasks, we propose the concept of hard models, which have similar effects as hard samples for adversarial attack tasks. Utilizing the concept of hard models, we propose a novel attack method called Beneficial Perturbation Feature Augmentation Attack (BPFA), which reduces the overfitting of adversarial examples to surrogate FR models by constantly generating new hard models to craft the adversarial examples. Specifically, in the backpropagation, BPFA records the gradients on pre-selected feature maps and uses the gradient on the input image to craft the adversarial example. In the next forward propagation, BPFA leverages the recorded gradients to add beneficial perturbations on their corresponding feature maps to increase the loss. Extensive experiments demonstrate that BPFA can significantly boost the transferability of adversarial attacks on FR.

翻译：人脸识别（FR）模型容易受到对抗样本的欺骗，这些对抗样本通过在良性人脸图像上添加不可察觉的扰动来生成。对抗性人脸样本的存在对社会安全构成了巨大威胁。为构建更具可持续性的数字国家，本文通过提高对抗性人脸样本的可迁移性，以暴露现有FR模型的更多盲点。尽管生成难样本在训练任务中已被证明能有效提升模型的泛化能力，但利用这一思路来提高对抗性人脸样本可迁移性的有效性尚未得到探索。为此，基于难样本的性质以及训练任务与对抗攻击任务之间的对称性，我们提出了难模型的概念，该概念在对抗攻击任务中具有与难样本类似的效果。利用难模型的概念，我们提出了一种新颖的攻击方法，称为有益扰动特征增强攻击（BPFA），它通过不断生成新的难模型来制作对抗样本，从而减少对抗样本对替代FR模型的过拟合。具体而言，在反向传播过程中，BPFA记录预选特征图上的梯度，并利用输入图像上的梯度来制作对抗样本。在接下来的前向传播中，BPFA利用记录的梯度在其对应特征图上添加有益扰动以增加损失。大量实验表明，BPFA能够显著提升FR上对抗攻击的可迁移性。