Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. In particular, we compare the network response layer by layer to determine where errors occurred. Several interesting findings are derived. First, compared to Gaussian random noise, intentionally generated adversarial noise causes severe behavior deviation by distracting the area of concentration in the networks. Second, in many cases, adversarial examples only need to compromise a few intermediate blocks to mislead the final decision. Third, our experiments revealed that specific blocks are more vulnerable and easier to exploit by adversarial examples. Finally, we demonstrate that the layers $Block4\_conv1$ and $Block5\_cov1$ of the VGG-16 model are more susceptible to adversarial attacks. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models.
翻译:深度学习(DL)正被应用于各个领域,尤其是在自动驾驶等安全关键型应用中。因此,确保这些方法的鲁棒性以对抗由对抗攻击引起的异常行为具有重大意义。本文利用梯度热图分析VGG-16模型在输入图像混入对抗噪声与统计相似的高斯随机噪声时的响应特性。具体而言,我们逐层比较网络响应以确定错误发生的位置。研究得到若干有趣发现:第一,与高斯随机噪声相比,刻意生成的对抗噪声通过分散网络关注区域导致严重的行为偏差;第二,在许多情况下,对抗样本仅需破坏少数中间模块即可误导最终决策;第三,实验表明特定模块更容易被对抗样本利用且更脆弱;最后,我们证明VGG-16模型中的$Block4\_conv1$和$Block5\_conv1$层对对抗攻击更为敏感。本研究可为开发更可靠的深度神经网络(DNN)模型提供重要见解。