The generation of adversarial inputs has become a crucial issue in establishing the robustness and trustworthiness of deep neural nets, especially when they are used in safety-critical application domains such as autonomous vehicles and precision medicine. However, the problem poses multiple practical challenges, including scalability issues owing to large-sized networks, and the generation of adversarial inputs that lack important qualities such as naturalness and output-impartiality. This problem shares its end goal with the task of patching neural nets where small changes in some of the network's weights need to be discovered so that upon applying these changes, the modified net produces the desirable output for a given set of inputs. We exploit this connection by proposing to obtain an adversarial input from a patch, with the underlying observation that the effect of changing the weights can also be brought about by changing the inputs instead. Thus, this paper presents a novel way to generate input perturbations that are adversarial for a given network by using an efficient network patching technique. We note that the proposed method is significantly more effective than the prior state-of-the-art techniques.
翻译:对抗输入的生成已成为确保深度神经网络鲁棒性和可信度的关键问题,尤其是在自动驾驶汽车和精准医学等安全关键应用领域中。然而,该问题面临多项实际挑战,包括因大规模网络引发的可扩展性问题,以及生成的对抗输入缺乏自然性、输出中立性等重要属性。该问题的最终目标与神经网络修补任务一致——需要发现网络中某些权重的微小变化,使得修改后的网络对给定输入集产生期望输出。我们利用这一关联,提出从修补中获取对抗输入,其核心观察在于:改变权重产生的效果也可通过改变输入来实现。因此,本文首次提出一种利用高效网络修补技术生成对给定网络具有对抗性的输入扰动的方法。我们注意到,所提方法的效果显著优于现有最先进技术。