Adaptive test-time defenses are used to improve the robustness of deep neural networks to adversarial examples. However, existing methods significantly increase the inference time due to additional optimization on the model parameters or the input at test time. In this work, we propose a novel adaptive test-time defense strategy that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically show that the top eigenspace of the feature matrix are more robust for a generalized additive model and support our argument for a large width neural network with the Neural Tangent Kernel (NTK) equivalence. We conduct extensive experiments on CIFAR-10 and CIFAR-100 datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench, and observe that the proposed method outperforms existing adaptive test-time defenses at much lower computation costs.
翻译:自适应测试时防御用于提升深度神经网络对对抗样本的鲁棒性。然而,现有方法因在测试时对模型参数或输入进行额外优化,显著增加了推理时间。本文提出一种新颖的自适应测试时防御策略,该策略易于集成到任何现有(鲁棒)训练过程中,且无需额外测试时计算。基于我们提出的特征鲁棒性概念,核心思想是将训练后的模型投影到最鲁棒的特征空间,从而降低非鲁棒方向上的对抗攻击脆弱性。我们从理论上证明,对于广义加性模型,特征矩阵的顶部特征空间具有更强的鲁棒性,并通过神经正切核(NTK)等价性为大宽度神经网络提供了论据支撑。我们在CIFAR-10和CIFAR-100数据集上针对多个鲁棒性基准(包括RobustBench中的最新方法)进行了大量实验,观察到所提方法在计算成本显著降低的情况下,性能优于现有自适应测试时防御方法。