Adversarial example (AE) is an attack method for machine learning, which is crafted by adding imperceptible perturbation to the data inducing misclassification. In the current paper, we investigated the upper bound of the probability of successful AEs based on the Gaussian Process (GP) classification, a probabilistic inference model. We proved a new upper bound of the probability of a successful AE attack that depends on AE's perturbation norm, the kernel function used in GP, and the distance of the closest pair with different labels in the training dataset. Surprisingly, the upper bound is determined regardless of the distribution of the sample dataset. We showed that our theoretical result was confirmed through the experiment using ImageNet. In addition, we showed that changing the parameters of the kernel function induces a change of the upper bound of the probability of successful AEs.
翻译:对抗样本(AE)是一种针对机器学习的攻击方法,其通过在数据上添加难以察觉的扰动来诱导错误分类。在本文中,我们基于高斯过程(GP)分类这一概率推断模型,研究了成功对抗样本概率的上界。我们证明了一个新的成功对抗样本攻击概率上界,该上界取决于对抗样本的扰动范数、GP中使用的核函数,以及训练数据集中具有不同标签的最近样本对之间的距离。令人惊讶的是,该上界的确立与样本数据集的分布无关。我们通过使用ImageNet的实验验证了我们的理论结果。此外,我们还证明了改变核函数的参数会导致成功对抗样本概率上界的变化。