Face recognition (FR) models can be easily fooled by adversarial examples, which are crafted by adding imperceptible perturbations on benign face images. To improve the transferability of adversarial face examples, we propose a novel attack method called Beneficial Perturbation Feature Augmentation Attack (BPFA), which reduces the overfitting of adversarial examples to surrogate FR models by constantly generating new models that have the similar effect of hard samples to craft the adversarial examples. Specifically, in the backpropagation, BPFA records the gradients on pre-selected features and uses the gradient on the input image to craft the adversarial example. In the next forward propagation, BPFA leverages the recorded gradients to add perturbations (i.e., beneficial perturbations) that can be pitted against the adversarial example on their corresponding features. The optimization process of the adversarial example and the optimization process of the beneficial perturbations added on the features correspond to a minimax two-player game. Extensive experiments demonstrate that BPFA can significantly boost the transferability of adversarial attacks on FR.
翻译:人脸识别(FR)模型容易被对抗样本欺骗,这些样本通过在良性人脸图像上添加难以察觉的扰动生成。为提高人脸对抗样本的可迁移性,我们提出一种名为"有益扰动特征增强攻击(BPFA)"的新型攻击方法,该方法通过持续生成具有类似困难样本效果的新模型来减少对抗样本对替代FR模型的过拟合,从而构建对抗样本。具体而言,在反向传播阶段,BPFA记录预选特征上的梯度,并利用输入图像的梯度生成对抗样本;在下一前向传播阶段,BPFA利用记录的梯度添加能够与对抗样本在其对应特征上形成对抗的扰动(即有益扰动)。对抗样本的优化过程与特征上添加的有益扰动的优化过程构成极小极大双人博弈。大量实验表明,BPFA能显著提升FR上对抗攻击的可迁移性。