Along with recent diffusion models, randomized smoothing has become one of a few tangible approaches that offers adversarial robustness to models at scale, e.g., those of large pre-trained models. Specifically, one can perform randomized smoothing on any classifier via a simple "denoise-and-classify" pipeline, so-called denoised smoothing, given that an accurate denoiser is available - such as diffusion model. In this paper, we present scalable methods to address the current trade-off between certified robustness and accuracy in denoised smoothing. Our key idea is to "selectively" apply smoothing among multiple noise scales, coined multi-scale smoothing, which can be efficiently implemented with a single diffusion model. This approach also suggests a new objective to compare the collective robustness of multi-scale smoothed classifiers, and questions which representation of diffusion model would maximize the objective. To address this, we propose to further fine-tune diffusion model (a) to perform consistent denoising whenever the original image is recoverable, but (b) to generate rather diverse outputs otherwise. Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning enables strong certified robustness available with high noise level while maintaining its accuracy close to non-smoothed classifiers.
翻译:随着近期扩散模型的发展,随机平滑已成为少数能在大规模模型(如大型预训练模型)上实现对抗鲁棒性的可行方法之一。具体而言,在具备高精度去噪器(例如扩散模型)的前提下,可通过简单的"去噪-分类"流程(即去噪平滑)对任意分类器执行随机平滑。本文提出可扩展方法以解决当前去噪平滑中认证鲁棒性与准确率之间的权衡问题。其核心思想是"选择性"地在多个噪声尺度上应用平滑——即多尺度平滑,该策略可通过单一扩散模型高效实现。该方法进一步提出用于比较多尺度平滑分类器集体鲁棒性的新目标函数,并探究何种扩散模型表征能最大化该目标。为此,我们提出对扩散模型进行微调:(a)在原始图像可恢复时执行一致性去噪;(b)否则生成多样化的输出。实验表明,所提出的多尺度平滑方案结合扩散模型微调,能在保持接近非平滑分类器准确率的同时,实现高噪声水平下的强认证鲁棒性。