Adversarial example (AE) is an attack method for machine learning, which is crafted by adding imperceptible perturbation to the data inducing misclassification. In the current paper, we investigated the upper bound of the probability of successful AEs based on the Gaussian Process (GP) classification. We proved a new upper bound that depends on AE's perturbation norm, the kernel function used in GP, and the distance of the closest pair with different labels in the training dataset. Surprisingly, the upper bound is determined regardless of the distribution of the sample dataset. We showed that our theoretical result was confirmed through the experiment using ImageNet. In addition, we showed that changing the parameters of the kernel function induces a change of the upper bound of the probability of successful AEs.
翻译:对抗样本(AE)是一种针对机器学习模型的攻击方法,通过向数据中添加难以察觉的扰动来导致分类错误。本文基于高斯过程(GP)分类方法,研究了成功对抗样本概率的上界。我们证明了一个新的上界,该上界取决于AE的扰动范数、GP中使用的核函数以及训练数据集中不同标签最近样本对之间的距离。令人惊讶的是,该上界与样本数据集的分布无关。我们通过使用ImageNet进行的实验验证了我们的理论结果。此外,我们证明了改变核函数的参数会导致成功对抗样本概率上界的变化。