While deep neural networks have excellent results in many fields, they are susceptible to interference from attacking samples resulting in erroneous judgments. Feature-level attacks are one of the effective attack types, which targets the learnt features in the hidden layers to improve its transferability across different models. Yet it is observed that the transferability has been largely impacted by the neuron importance estimation results. In this paper, a double adversarial neuron attribution attack method, termed `DANAA', is proposed to obtain more accurate feature importance estimation. In our method, the model outputs are attributed to the middle layer based on an adversarial non-linear path. The goal is to measure the weight of individual neurons and retain the features that are more important towards transferability. We have conducted extensive experiments on the benchmark datasets to demonstrate the state-of-the-art performance of our method. Our code is available at: https://github.com/Davidjinzb/DANAA
翻译:尽管深度神经网络在许多领域取得了优异成果,但其易受攻击样本干扰而导致错误判断。特征级攻击是有效的攻击类型之一,通过针对隐藏层学习到的特征来提升跨模型迁移性。然而研究表明,迁移性受神经元重要性估计结果影响显著。本文提出一种名为"DANAA"的双重对抗神经元归因攻击方法,旨在获得更精确的特征重要性估计。该方法基于对抗性非线性路径将模型输出归因于中间层,目标是衡量单个神经元的权重并保留对迁移性更重要的特征。我们在基准数据集上进行了大量实验,证明了该方法具有最优性能。代码开源地址:https://github.com/Davidjinzb/DANAA