Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) techniques, and even when using state-of-the-art computational facilities. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.
翻译:对抗训练以生成对对抗扰动具有经验鲁棒性的高质量神经网络模型而闻名。然而,一旦模型经过对抗训练,人们往往需要认证该模型确实能够抵御未来所有攻击。不幸的是,面对对抗训练模型时,现有所有方法在生成足以具有实际应用价值的强认证方面均面临显著困难。线性规划(LP)技术尤其面临"凸松弛屏障",即使采用混合整数线性规划(MILP)技术进行细化,甚至使用最先进的计算设施,也无法生成高质量的认证。本文提出一种基于半定规划(SDP)松弛的低秩限制的非凸认证技术。这种非凸松弛能够实现与成本更高的SDP方法相媲美的强认证,同时优化变量数量显著减少,与弱得多的LP方法相当。尽管存在非凸性,我们展示了如何利用现成的局部优化算法,在多项式时间内实现并认证全局最优性。实验发现,非凸松弛几乎完全弥合了对抗训练模型精确认证的差距。