Convolutional neural network classifiers (CNNs) are susceptible to adversarial attacks that perturb original samples to fool classifiers such as an autonomous vehicle's road sign image classifier. CNNs also lack invariance in the classification of symmetric samples because CNNs can classify symmetric samples differently. Considered together, the CNN lack of adversarial robustness and the CNN lack of invariance mean that the classification of symmetric adversarial samples can differ from their incorrect classification. Could symmetric adversarial samples revert to their correct classification? This paper answers this question by designing a symmetry defense that inverts or horizontally flips adversarial samples before classification against adversaries unaware of the defense. Against adversaries aware of the defense, the defense devises a Klein four symmetry subgroup that includes the horizontal flip and pixel inversion symmetries. The symmetry defense uses the subgroup symmetries in accuracy evaluation and the subgroup closure property to confine the transformations that an adaptive adversary can apply before or after generating the adversarial sample. Without changing the preprocessing, parameters, or model, the proposed symmetry defense counters the Projected Gradient Descent (PGD) and AutoAttack attacks with near-default accuracies for ImageNet. Without using attack knowledge or adversarial samples, the proposed defense exceeds the current best defense, which trains on adversarial samples. The defense maintains and even improves the classification accuracy of non-adversarial samples.
翻译:卷积神经网络分类器(CNN)易受对抗性攻击影响,该类攻击通过扰动原始样本以欺骗分类器(如自动驾驶汽车的道路标志图像分类器)。CNN在对称样本分类方面同样缺乏不变性,因为其对对称样本可能产生不同分类结果。综合来看,CNN缺乏对抗鲁棒性与不变性意味着,对称对抗样本的分类结果可能与其错误分类结果存在差异。那么对称对抗样本能否恢复正确分类?本文通过设计对称性防御机制回答该问题:针对未知防御的对手,在分类前将对抗样本进行水平翻转或像素逆转变换。针对已知防御的对手,该方法构建包含水平翻转与像素反转对称性的克莱因四元对称子群。该对称性防御利用子群对称性进行精度评估,并通过子群封闭性约束自适应对手在生成对抗样本前后可施行的变换。在不改变预处理流程、参数或模型的前提下,所提出的对称性防御机制能够以接近默认精度的性能抵御ImageNet上的投影梯度下降(PGD)与AutoAttack攻击。无需利用攻击知识或对抗样本,该防御方法即超越当前需用对抗样本训练的最优防御方案。该方法能够保持甚至提升非对抗样本的分类精度。