Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose \textit{SPLITZ}, a practical and novel approach which leverages the synergistic benefits of both the above ideas into a single framework. Our main idea is to \textit{split} a classifier into two halves, constrain the Lipschitz constant of the first half, and smooth the second half via randomization. Motivation for \textit{SPLITZ} comes from the observation that many standard deep networks exhibit heterogeneity in Lipschitz constants across layers. \textit{SPLITZ} can exploit this heterogeneity while inheriting the scalability of randomized smoothing. We present a principled approach to train \textit{SPLITZ} and provide theoretical analysis to derive certified robustness guarantees during inference. We present a comprehensive comparison of robustness-accuracy tradeoffs and show that \textit{SPLITZ} consistently improves upon existing state-of-the-art approaches on MNIST and CIFAR-10 datasets. For instance, with $\ell_2$ norm perturbation budget of \textbf{$\epsilon=1$}, \textit{SPLITZ} achieves $\textbf{43.2\%}$ top-1 test accuracy on CIFAR-10 dataset compared to state-of-art top-1 test accuracy $\textbf{39.8\%}
翻译:可认证鲁棒性保证了分类器输入周围的微小扰动不会改变预测结果。目前为对抗样本提供可认证鲁棒性的方法主要有两种:a) 显式训练具有小Lipschitz常数的分类器;b) 随机平滑,通过向输入添加随机噪声来构建平滑分类器。本文提出\textit{SPLITZ}——一种新颖实用的方法,将上述两种思路的协同优势整合到统一框架中。我们的核心思想是将分类器\textit{分割}为前后两部分,约束前半部分的Lipschitz常数,并通过随机化对后半部分进行平滑处理。\textit{SPLITZ}的提出动机源于观察到许多标准深度网络在不同层间存在Lipschitz常数的异质性。\textit{SPLITZ}能够利用这种异质性,同时继承随机平滑方法的可扩展性。我们提出了训练\textit{SPLITZ}的原则性方法,并通过理论分析推导了推理阶段的可认证鲁棒性保证。通过系统比较鲁棒性与准确性的权衡关系,我们证明在MNIST和CIFAR-10数据集上,\textit{SPLITZ}始终优于现有最先进方法。例如在$\ell_2$范数扰动预算$\epsilon=1$的条件下,\textit{SPLITZ}在CIFAR-10数据集上达到$\textbf{43.2\%}$的top-1测试准确率,而当前最优方法的top-1测试准确率为$\textbf{39.8\%}$。