Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage high-performance pre-trained neural networks. In this work, we introduce a novel certifying adapters framework (CAF) that enables and enhances the certification of classifier adversarial robustness. Our approach makes few assumptions about the underlying training algorithm or feature extractor and is thus broadly applicable to different feature extractor architectures (e.g., convolutional neural networks or vision transformers) and smoothing algorithms. We show that CAF (a) enables certification in uncertified models pre-trained on clean datasets and (b) substantially improves the performance of certified classifiers via randomized smoothing and SmoothAdv at multiple radii in CIFAR-10 and ImageNet. We demonstrate that CAF achieves improved certified accuracies when compared to methods based on random or denoised smoothing, and that CAF is insensitive to certifying adapter hyperparameters. Finally, we show that an ensemble of adapters enables a single pre-trained feature extractor to defend against a range of noise perturbation scales.
翻译:随机平滑已成为实现深度分类器对l_{p}范数对抗扰动具有认证鲁棒性的主流方法。当前实现认证鲁棒性的方法,如高斯噪声数据增强和对抗训练,需要昂贵的训练过程来为不同高斯噪声水平调整大型模型,因而无法利用高性能的预训练神经网络。本文提出了一种新颖的认证适配器框架(CAF),该框架能够实现并增强分类器对抗鲁棒性的认证。我们的方法对底层训练算法或特征提取器的假设极少,因此广泛适用于不同的特征提取器架构(例如卷积神经网络或视觉Transformer)和平滑算法。我们证明CAF能够:(a)为在干净数据集上预训练的未认证模型实现认证能力;(b)通过随机平滑和SmoothAdv方法,在CIFAR-10和ImageNet数据集上针对多个扰动半径显著提升已认证分类器的性能。实验表明,与基于随机平滑或去噪平滑的方法相比,CAF获得了更高的认证准确率,且对认证适配器超参数不敏感。最后,我们证明适配器集成技术可使单个预训练特征提取器防御多种噪声扰动尺度。