Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks. Generally, AT involves training DNN models with adversarial examples obtained within a pre-defined, fixed perturbation bound. Notably, individual natural examples from which these adversarial examples are crafted exhibit varying degrees of intrinsic vulnerabilities, and as such, crafting adversarial examples with fixed perturbation radius for all instances may not sufficiently unleash the potency of AT. Motivated by this observation, we propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT, named Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB). The proposed methods assign perturbation radii to individual adversarial samples based on the vulnerability of their corresponding natural examples. Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.
翻译:对抗训练(Adversarial Training,AT)能有效提升深度神经网络(DNNs)抵御对抗攻击的鲁棒性。通常,AT通过使用在预设固定扰动范围内生成的对抗样本来训练DNN模型。值得注意的是,生成这些对抗样本的原始自然样本存在不同程度的固有脆弱性,因此对所有样本采用固定扰动半径生成的对抗样本可能无法充分发挥AT的效力。基于这一观察,我们提出两种简单且计算成本低的脆弱性感知加权函数,用于为AT中的对抗样本分配扰动边界,分别称为边际加权扰动预算(MWPB)和标准差加权扰动预算(SDWPB)。所提方法依据自然样本的脆弱性程度,为每个对抗样本分配不同的扰动半径。实验结果表明,所提方法能切实提升AT算法抵御各类对抗攻击的鲁棒性。