Most machine learning models are vulnerable to adversarial examples, which poses security concerns on these models. Adversarial examples are crafted by applying subtle but intentionally worst-case modifications to examples from the dataset, leading the model to output a different answer from the original example. In this paper, adversarial examples are formed in an exactly opposite manner, which are significantly different from the original examples but result in the same answer. We propose a novel set of algorithms to produce such adversarial examples, including the negative iterative fast gradient sign method (NI-FGSM) and the negative iterative fast gradient method (NI-FGM), along with their momentum variants: the negative momentum iterative fast gradient sign method (NMI-FGSM) and the negative momentum iterative fast gradient method (NMI-FGM). Adversarial examples constructed by these methods could be used to perform an attack on machine learning systems in certain occasions. Moreover, our results show that the adversarial examples are not merely distributed in the neighbourhood of the examples from the dataset; instead, they are distributed extensively in the sample space.
翻译:大多数机器学习模型都容易受到对抗样本的攻击,这对这些模型的安全性构成了隐患。对抗样本是通过对数据集中的样本施加细微但经过精心设计的、最坏情况下的修改而构造出来的,导致模型输出与原始样本不同的答案。在本文中,对抗样本以一种完全相反的方式形成,它们与原始样本存在显著差异,却导致模型输出相同的答案。我们提出了一套新颖的算法来生成此类对抗样本,包括负向迭代快速梯度符号法(NI-FGSM)和负向迭代快速梯度法(NI-FGM),以及它们的动量变体:负向动量迭代快速梯度符号法(NMI-FGSM)和负向动量迭代快速梯度法(NMI-FGM)。由这些方法构建的对抗样本在某些情况下可用于对机器学习系统发起攻击。此外,我们的结果表明,对抗样本并非仅仅分布在数据集样本的邻域内;相反,它们在样本空间中广泛分布。