Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose SPLITZ, a practical and novel approach which leverages the synergistic benefits of both the above ideas into a single framework. Our main idea is to split a classifier into two halves, constrain the Lipschitz constant of the first half, and smooth the second half via randomization. Motivation for SPLITZ comes from the observation that many standard deep networks exhibit heterogeneity in Lipschitz constants across layers. SPLITZ can exploit this heterogeneity while inheriting the scalability of randomized smoothing. We present a principled approach to train SPLITZ and provide theoretical analysis to derive certified robustness guarantees during inference. We present a comprehensive comparison of robustness-accuracy trade-offs and show that SPLITZ consistently improves on existing state-of-the-art approaches in the MNIST, CIFAR-10 and ImageNet datasets. For instance, with $\ell_2$ norm perturbation budget of $\epsilon=1$, SPLITZ achieves $43.2\%$ top-1 test accuracy on CIFAR-10 dataset compared to state-of-art top-1 test accuracy $39.8\%$.
翻译:可认证鲁棒性保证了分类器输入附近的微小扰动不会改变预测结果。目前为对抗样本提供可认证鲁棒性的方法主要有两种:a) 显式训练具有小Lipschitz常数的分类器;b) 随机平滑,即通过向输入添加随机噪声构建平滑分类器。本文提出SPLITZ——一种新颖实用的方法,将上述两种思路的协同优势整合到统一框架中。我们的核心思想是将分类器分割为两部分:约束前半部分的Lipschitz常数,同时对后半部分进行随机平滑。SPLITZ的提出动机源于观测到许多标准深度网络在不同层间存在Lipschitz常数的异质性。SPLITZ既能利用这种异质性,又能继承随机平滑的可扩展性。我们提出了训练SPLITZ的原则性方法,并通过理论分析推导推理阶段的可认证鲁棒性保证。通过全面比较鲁棒性与准确性的权衡关系,我们证明在MNIST、CIFAR-10和ImageNet数据集上,SPLITZ始终优于现有最先进方法。例如在$\ell_2$范数扰动预算$\epsilon=1$的条件下,SPLITZ在CIFAR-10数据集上实现了$43.2\%$的top-1测试准确率,而当前最优方法的top-1测试准确率为$39.8\%$。