Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks while making minimal assumptions about a classifier. This method relies on taking a majority vote of any base classifier over multiple noise-perturbed inputs to obtain a smoothed classifier, and it remains the tool of choice to certify deep and complex neural network models. Nonetheless, non-trivial performance of such smoothed classifier crucially depends on the base model being trained on noise-augmented data, i.e., on a smoothed input distribution. While widely adopted in practice, it is still unclear how this noisy training of the base classifier precisely affects the risk of the robust smoothed classifier, leading to heuristics and tricks that are poorly understood. In this work we analyze these trade-offs theoretically in a binary classification setting, proving that these common observations are not universal. We show that, without making stronger distributional assumptions, no benefit can be expected from predictors trained with noise-augmentation, and we further characterize distributions where such benefit is obtained. Our analysis has direct implications to the practical deployment of randomized smoothing, and we illustrate some of these via experiments on CIFAR-10 and MNIST, as well as on synthetic datasets.
翻译:随机平滑是一种在仅对分类器做出极少量假设的前提下,提供对抗性攻击的可证明鲁棒性保证的技术。该方法通过对多个噪声扰动输入下的任意基础分类器进行多数投票来获得平滑分类器,并且它仍然是认证深度复杂神经网络模型的首选工具。然而,此类平滑分类器的非平凡性能关键取决于基础模型是否在噪声增强数据(即平滑输入分布)上训练。尽管在实践中被广泛采用,但基础分类器的这种噪声训练如何精确影响鲁棒平滑分类器的风险仍不清楚,导致了一些难以理解的启发式和技巧。在本工作中,我们在二分类设置下从理论上分析了这些权衡,证明这些常见观察结果并非普遍适用。我们表明,在不做出更强分布假设的情况下,无法预期从通过噪声增强训练的预测器获益,并进一步刻画了能够获得此类收益的分布。我们的分析对随机平滑的实际部署具有直接启示,并通过在CIFAR-10和MNIST以及合成数据集上的实验对其部分进行了说明。