Deep Neural Networks (DNNs) are vulnerable to adversarial examples, while adversarial attack models, e.g., DeepFool, are on the rise and outrunning adversarial example detection techniques. This paper presents a new adversarial example detector that outperforms state-of-the-art detectors in identifying the latest adversarial attacks on image datasets. Specifically, we propose to use sentiment analysis for adversarial example detection, qualified by the progressively manifesting impact of an adversarial perturbation on the hidden-layer feature maps of a DNN under attack. Accordingly, we design a modularized embedding layer with the minimum learnable parameters to embed the hidden-layer feature maps into word vectors and assemble sentences ready for sentiment analysis. Extensive experiments demonstrate that the new detector consistently surpasses the state-of-the-art detection algorithms in detecting the latest attacks launched against ResNet and Inception neutral networks on the CIFAR-10, CIFAR-100 and SVHN datasets. The detector only has about 2 million parameters, and takes shorter than 4.6 milliseconds to detect an adversarial example generated by the latest attack models using a Tesla K80 GPU card.
翻译:深度神经网络(DNNs)易受对抗样本攻击,而诸如DeepFool等对抗攻击模型层出不穷,其性能已超越现有的对抗样本检测技术。本文提出一种新型对抗样本检测器,在图像数据集上识别最新对抗攻击时,其性能优于现有先进检测器。具体而言,我们提出利用情感分析进行对抗样本检测,其依据在于:受攻击DNN隐藏层特征图中,对抗扰动的影响会逐步显现。据此,我们设计了一种参数最小化的模块化嵌入层,可将隐藏层特征图嵌入为词向量,并组装成适用于情感分析的句子。大量实验表明,在检测针对CIFAR-10、CIFAR-100和SVHN数据集上ResNet与Inception神经网络的最新攻击时,该检测器始终优于现有先进检测算法。该检测器仅包含约200万个参数,使用Tesla K80 GPU卡检测最新攻击模型生成的对抗样本耗时低于4.6毫秒。