Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
翻译:卷积神经网络(CNN)定义了诸多感知任务中的最先进解决方案。然而,当前的CNN方法在很大程度上仍然容易受到针对输入精心设计的对抗性扰动的影响,这些扰动对人类视觉几乎不可察觉,但专门用于欺骗系统。近年来,研究者提出了多种方法来防御此类攻击,例如通过模型强化或添加显式防御机制。其中,网络中包含一个轻量级“检测器”,并针对区分真实数据与含有对抗性扰动的数据这一二分类任务进行训练。在本工作中,我们提出了一种简单轻量的检测器,其利用了关于网络局部内在维度(LID)与对抗攻击之间关系的最新发现。基于对LID测度的重新解释及若干简单调整,我们显著超越了对抗检测的现有最佳水平,并在多个网络和数据集上实现了近乎完美的F1分数。源代码见:https://github.com/adverML/multiLID