Local robustness verification can verify that a neural network is robust wrt. any perturbation to a specific input within a certain distance. We call this distance Robustness Radius. We observe that the robustness radii of correctly classified inputs are much larger than that of misclassified inputs which include adversarial examples, especially those from strong adversarial attacks. Another observation is that the robustness radii of correctly classified inputs often follow a normal distribution. Based on these two observations, we propose to validate inputs for neural networks via runtime local robustness verification. Experiments show that our approach can protect neural networks from adversarial examples and improve their accuracies.
翻译:局部鲁棒性验证能够验证神经网络在特定输入受到一定距离内任意扰动时的鲁棒性。我们将该距离称为鲁棒半径。我们发现,正确分类输入的鲁棒半径远大于错误分类输入(包括对抗样本,尤其是来自强对抗攻击的样本)的鲁棒半径。另一个发现是,正确分类输入的鲁棒半径通常服从正态分布。基于这两个观察,我们提出通过运行时局部鲁棒性验证对神经网络输入进行验证。实验表明,我们的方法能够保护神经网络免受对抗样本攻击,并提高其分类准确率。