As deep learning models continue to advance and are increasingly utilized in real-world systems, the issue of robustness remains a major challenge. Existing certified training methods produce models that achieve high provable robustness guarantees at certain perturbation levels. However, the main problem of such models is a dramatically low standard accuracy, i.e. accuracy on clean unperturbed data, that makes them impractical. In this work, we consider a more realistic perspective of maximizing the robustness of a model at certain levels of (high) standard accuracy. To this end, we propose a novel certified training method based on a key insight that training with adaptive certified radii helps to improve both the accuracy and robustness of the model, advancing state-of-the-art accuracy-robustness tradeoffs. We demonstrate the effectiveness of the proposed method on MNIST, CIFAR-10, and TinyImageNet datasets. Particularly, on CIFAR-10 and TinyImageNet, our method yields models with up to two times higher robustness, measured as an average certified radius of a test set, at the same levels of standard accuracy compared to baseline approaches.
翻译:随着深度学习模型不断进步并日益应用于现实系统,鲁棒性问题仍是一项重大挑战。现有的认证训练方法能够生成在特定扰动水平下具有高度可证明鲁棒性保证的模型。然而,这类模型的主要问题在于其标准准确性(即对干净未扰动数据的准确性)极低,导致其缺乏实用性。在本工作中,我们考虑一个更现实的视角:在特定(高)标准准确性水平下最大化模型的鲁棒性。为此,我们提出了一种新颖的认证训练方法,其核心洞察在于:采用自适应认证半径进行训练有助于同时提升模型的准确性与鲁棒性,从而推进了准确性-鲁棒性权衡的最新进展。我们在MNIST、CIFAR-10和TinyImageNet数据集上验证了该方法的有效性。特别地,在CIFAR-10和TinyImageNet上,与基线方法相比,我们的方法在相同标准准确性水平下,能够生成鲁棒性高达两倍的模型(以测试集的平均认证半径衡量)。