In this paper, we propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate the output distribution of a deep neural network. With these observations, we propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection. Specifically, we study the distributional difference of hidden layer output between natural and adversarial examples, and propose to use the randomness of the Bayesian neural network to simulate hidden layer output distribution and leverage the distribution dispersion to detect adversarial examples. The advantage of a Bayesian neural network is that the output is stochastic while a deep neural network without random components does not have such characteristics. Empirical results on several benchmark datasets against popular attacks show that the proposed BATer outperforms the state-of-the-art detectors in adversarial example detection.
翻译:本文提出一种新的对抗样本检测框架,其动机源于以下观察:随机成分能够改善预测器的平滑性,并便于模拟深度神经网络的输出分布。基于这些观察,我们设计了一种新型贝叶斯对抗样本检测器(简称BATer),以提升对抗样本检测性能。具体而言,我们研究了自然样本与对抗样本在隐藏层输出上的分布差异,提出利用贝叶斯神经网络的随机性模拟隐藏层输出分布,并借助分布离散度来检测对抗样本。贝叶斯神经网络的优势在于其输出具有随机性,而缺乏随机成分的深度神经网络则不具备此特性。在多个基准数据集上针对主流攻击方法的实验结果表明,所提出的BATer在对抗样本检测任务中优于现有最先进的检测器。