Artificial neural networks are prone to being fooled by carefully perturbed inputs which cause an egregious misclassification. These \textit{adversarial} attacks have been the focus of extensive research. Likewise, there has been an abundance of research in ways to detect and defend against them. We introduce a novel approach of detection and interpretation of adversarial attacks from a graph perspective. For an image, benign or adversarial, we study how a neural network's architecture can induce an associated graph. We study this graph and introduce specific measures used to predict and interpret adversarial attacks. We show that graphs-based approaches help to investigate the inner workings of adversarial attacks.
翻译:人工神经网络容易受到精心扰动的输入欺骗,导致严重的错误分类。这些对抗攻击已成为大量研究的焦点。同样,针对如何检测和防御对抗攻击的研究也十分丰富。我们从图论角度提出一种检测和解释对抗攻击的新方法。对于良性或对抗性图像,我们研究神经网络架构如何生成关联图。通过分析该图,我们引入特定度量指标来预测和解释对抗攻击。研究表明,基于图的方法有助于揭示对抗攻击的内部工作机制。