Recent works have demonstrated that deep neural networks (DNNs) are highly vulnerable to adversarial attacks. To defend against adversarial attacks, many defense strategies have been proposed, among which adversarial training has been demonstrated to be the most effective strategy. However, it has been known that adversarial training sometimes hurts natural accuracy. Then, many works focus on optimizing model parameters to handle the problem. Different from the previous approaches, in this paper, we propose a new approach to improve the adversarial robustness by using an external signal rather than model parameters. In the proposed method, a well-optimized universal external signal called a booster signal is injected into the outside of the image which does not overlap with the original content. Then, it boosts both adversarial robustness and natural accuracy. The booster signal is optimized in parallel to model parameters step by step collaboratively. Experimental results show that the booster signal can improve both the natural and robust accuracies over the recent state-of-the-art adversarial training methods. Also, optimizing the booster signal is general and flexible enough to be adopted on any existing adversarial training methods.
翻译:近期研究表明,深度神经网络(DNNs)极易受到对抗性攻击的威胁。为抵御此类攻击,研究者提出了多种防御策略,其中对抗训练被证明是最为有效的方法。然而,已知对抗训练有时会损害自然准确率。为此,大量研究致力于优化模型参数以解决该问题。与以往方法不同,本文提出一种利用外部信号而非模型参数来提升对抗鲁棒性的新方法。在该方法中,我们将经过充分优化的通用外部信号(称为"增强信号")注入到图像外部不覆盖原始内容的区域,从而同时增强对抗鲁棒性与自然准确率。该增强信号与模型参数以协同方式逐步并行优化。实验结果表明,与近期最先进的对抗训练方法相比,增强信号能够显著提升自然准确率与鲁棒准确率。此外,增强信号的优化具有通用性和灵活性,可无缝集成至任何现有对抗训练方法中。