Symmetry Defense Against CNN Adversarial Perturbation Attacks

This paper uses symmetry to make Convolutional Neural Network classifiers (CNNs) robust against adversarial perturbation attacks. Such attacks add perturbation to original images to generate adversarial images that fool classifiers such as road sign classifiers of autonomous vehicles. Although symmetry is a pervasive aspect of the natural world, CNNs are unable to handle symmetry well. For example, a CNN can classify an image differently from its mirror image. For an adversarial image that misclassifies with a wrong label $l_w$, CNN inability to handle symmetry means that a symmetric adversarial image can classify differently from the wrong label $l_w$. Further than that, we find that the classification of a symmetric adversarial image reverts to the correct label. To classify an image when adversaries are unaware of the defense, we apply symmetry to the image and use the classification label of the symmetric image. To classify an image when adversaries are aware of the defense, we use mirror symmetry and pixel inversion symmetry to form a symmetry group. We apply all the group symmetries to the image and decide on the output label based on the agreement of any two of the classification labels of the symmetry images. Adaptive attacks fail because they need to rely on loss functions that use conflicting CNN output values for symmetric images. Without attack knowledge, the proposed symmetry defense succeeds against both gradient-based and random-search attacks, with up to near-default accuracies for ImageNet. The defense even improves the classification accuracy of original images.

翻译：本文利用对称性提升卷积神经网络（CNN）分类器对对抗性扰动攻击的鲁棒性。此类攻击通过向原始图像添加扰动生成对抗样本，以欺骗分类器（如自动驾驶车辆的路标分类器）。尽管对称性是自然界的普遍特征，但CNN难以有效处理对称性。例如，CNN对同一图像及其镜像图像的分类结果可能不同。对于被错误标记为$l_w$的对抗图像，CNN缺乏对称性处理能力意味着其对称化后的对抗图像可能产生不同于错误标签$l_w$的分类结果。进一步研究发现，对称化后的对抗图像分类结果会恢复至正确标签。在攻击者不知晓防御机制的场景中，我们对图像施加对称变换，并采用对称图像的分类标签进行判定。当攻击者知晓防御机制时，我们利用镜像对称与像素反演对称构建对称群，对图像实施所有群对称变换，并依据任意两个对称图像分类标签的一致性确定输出标签。自适应攻击失败的原因在于其损失函数需要依赖对称图像的冲突性CNN输出值。在无攻击先验知识条件下，所提对称防御方法对基于梯度与随机搜索的攻击均有效，在ImageNet数据集上可恢复接近默认的准确率。该防御机制甚至能提升原始图像的分类精度。