Adversarial attacks on a convolutional neural network (CNN) -- injecting human-imperceptible perturbations into an input image -- could fool a high-performance CNN into making incorrect predictions. The success of adversarial attacks raises serious concerns about the robustness of CNNs, and prevents them from being used in safety-critical applications, such as medical diagnosis and autonomous driving. Our work introduces a visual analytics approach to understanding adversarial attacks by answering two questions: (1) which neurons are more vulnerable to attacks and (2) which image features do these vulnerable neurons capture during the prediction? For the first question, we introduce multiple perturbation-based measures to break down the attacking magnitude into individual CNN neurons and rank the neurons by their vulnerability levels. For the second, we identify image features (e.g., cat ears) that highly stimulate a user-selected neuron to augment and validate the neuron's responsibility. Furthermore, we support an interactive exploration of a large number of neurons by aiding with hierarchical clustering based on the neurons' roles in the prediction. To this end, a visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks. We validate the effectiveness of our system through multiple case studies as well as feedback from domain experts.
翻译:对抗攻击通过向输入图像注入人眼不可察觉的扰动,能够欺骗高性能卷积神经网络做出错误预测。对抗攻击的成功严重质疑了卷积神经网络的鲁棒性,并阻碍其在医疗诊断和自动驾驶等安全关键型应用中的部署。本文提出一种可视分析方法,通过回答两个问题来理解对抗攻击:(1)哪些神经元更容易受到攻击?(2)这些脆弱神经元在预测过程中捕捉了哪些图像特征?针对第一个问题,我们引入多种基于扰动的度量方法,将攻击幅值分解到单个CNN神经元上,并根据脆弱性水平对神经元进行排序。针对第二个问题,我们识别出能强烈激发用户选定神经元的图像特征(如猫耳朵),以验证和增强神经元的责任归因。此外,我们基于神经元在预测中的角色进行层次聚类,支持对大量神经元的交互式探索。为此,设计了一个可视分析系统,将视觉推理融入对抗攻击的解读过程。通过多个案例研究和领域专家反馈验证了系统的有效性。