Great advances in deep neural networks (DNNs) have led to state-of-the-art performance on a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial attacks, which have brought great concerns when deploying these models to safety-critical applications such as autonomous driving. Different defense approaches have been proposed against adversarial attacks, including: a) empirical defenses, which can usually be adaptively attacked again without providing robustness certification; and b) certifiably robust approaches, which consist of robustness verification providing the lower bound of robust accuracy against any attacks under certain conditions and corresponding robust training approaches. In this paper, we systematize certifiably robust approaches and related practical and theoretical implications and findings. We also provide the first comprehensive benchmark on existing robustness verification and training approaches on different datasets. In particular, we 1) provide a taxonomy for the robustness verification and training approaches, as well as summarize the methodologies for representative algorithms, 2) reveal the characteristics, strengths, limitations, and fundamental connections among these approaches, 3) discuss current research progresses, theoretical barriers, main challenges, and future directions for certifiably robust approaches for DNNs, and 4) provide an open-sourced unified platform to evaluate 20+ representative certifiably robust approaches.
翻译:深度神经网络(DNNs)的重大进展已使其在广泛任务中达到最先进性能。然而最新研究表明,DNNs易受对抗性攻击,这给自动驾驶等安全关键应用部署这些模型带来了极大担忧。目前已提出多种防御对抗攻击的方法,包括:a) 经验性防御方法,通常可被自适应攻击再次攻破且无法提供鲁棒性认证;b) 可证明鲁棒性方法,包含可在特定条件下针对任意攻击提供鲁棒精度下界鲁棒性验证方法及相应鲁棒训练方法。本文系统梳理了可证明鲁棒性方法及其相关理论与实践启示与发现,并首次针对不同数据集上的现有鲁棒性验证与训练方法建立综合基准。具体而言,我们:1) 提出鲁棒性验证与训练方法的分类体系,总结代表性算法的技术路线;2) 揭示这些方法的特性、优势、局限性及内在关联;3) 探讨可证明鲁棒性方法在DNNs领域的研究进展、理论瓶颈、主要挑战及未来方向;4) 提供开源统一平台,实现对20余种代表性可证明鲁棒性方法的评估。