The ability to fool deep learning classifiers with tiny perturbations of the input has lead to the development of adversarial training in which the loss with respect to adversarial examples is minimized in addition to the training examples. While adversarial training improves the robustness of the learned classifiers, the procedure is computationally expensive, sensitive to hyperparameters and may still leave the classifier vulnerable to other types of small perturbations. In this paper we analyze the adversarial robustness of the 1 Nearest Neighbor (1NN) classifier and compare its performance to adversarial training. We prove that under reasonable assumptions, the 1 NN classifier will be robust to {\em any} small image perturbation of the training images and will give high adversarial accuracy on test images as the number of training examples goes to infinity. In experiments with 45 different binary image classification problems taken from CIFAR10, we find that 1NN outperform TRADES (a powerful adversarial training algorithm) in terms of average adversarial accuracy. In additional experiments with 69 pretrained robust models for CIFAR10, we find that 1NN outperforms almost all of them in terms of robustness to perturbations that are only slightly different from those seen during training. Taken together, our results suggest that modern adversarial training methods still fall short of the robustness of the simple 1NN classifier. our code can be found at https://github.com/amirhagai/On-Adversarial-Training-And-The-1-Nearest-Neighbor-Classifier
翻译:通过微小输入扰动欺骗深度学习分类器的能力,催生了对抗训练的发展——该方法在训练样本基础上,额外最小化对抗样本对应的损失函数。虽然对抗训练能提升学习分类器的鲁棒性,但该过程计算成本高昂、对超参数敏感,且仍可能使分类器易受其他类型微小扰动的影响。本文分析了1近邻(1NN)分类器的对抗鲁棒性,并将其性能与对抗训练进行对比。我们证明在合理假设下,当训练样本数量趋于无穷时,1NN分类器将对训练图像的{\em任何}微小图像扰动保持鲁棒,并在测试图像上实现高对抗准确率。在基于CIFAR10数据集构建的45个二元图像分类任务实验中,我们发现1NN的平均对抗准确率优于TRADES(一种强对抗训练算法)。通过对CIFAR10上69个预训练鲁棒模型的附加实验,我们发现:对于与训练期间所见扰动仅存在细微差异的扰动,1NN的鲁棒性几乎优于所有预训练模型。综合实验结果,我们的研究表明现代对抗训练方法在鲁棒性方面仍不及简单的1NN分类器。相关代码已开源至https://github.com/amirhagai/On-Adversarial-Training-And-The-1-Nearest-Neighbor-Classifier