The security and robustness of deep neural networks (DNNs) have become increasingly concerning. This paper aims to provide both a theoretical foundation and a practical solution to ensure the reliability of DNNs. We explore the concept of Lipschitz continuity to certify the robustness of DNNs against adversarial attacks, which aim to mislead the network with adding imperceptible perturbations into inputs. We propose a novel algorithm that remaps the input domain into a constrained range, reducing the Lipschitz constant and potentially enhancing robustness. Unlike existing adversarially trained models, where robustness is enhanced by introducing additional examples from other datasets or generative models, our method is almost cost-free as it can be integrated with existing models without requiring re-training. Experimental results demonstrate the generalizability of our method, as it can be combined with various models and achieve enhancements in robustness. Furthermore, our method achieves the best robust accuracy for CIFAR10, CIFAR100, and ImageNet datasets on the RobustBench leaderboard.
翻译:深度神经网络的安全性与鲁棒性日益受到关注。本文旨在为保障深度神经网络的可靠性提供理论基础与实用解决方案。我们探讨利普希茨连续性概念,以认证深度神经网络抵御对抗攻击的鲁棒性——此类攻击旨在通过向输入添加难以察觉的扰动来误导网络。我们提出一种新颖算法,将输入域重新映射到受限范围内,从而降低利普希茨常数并潜在地增强鲁棒性。与现有对抗训练模型(通过引入其他数据集或生成模型的额外样本来增强鲁棒性)不同,我们的方法几乎无需成本,因其可与现有模型集成且无需重新训练。实验结果表明,本方法具有良好的泛化能力,可与多种模型结合并实现鲁棒性提升。此外,在RobustBench基准排行榜上,我们的方法在CIFAR10、CIFAR100和ImageNet数据集上取得了最优的鲁棒准确率。