Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data. Our intuition is to view recent text-to-image diffusion models as "adaptable" denoisers that can be optimized to specify target tasks. Based on this, we propose: (a) to initiate a denoise-and-classify pipeline that offers provable guarantees against adversarial attacks, and (b) to leverage a few synthetic reference images generated from the text-to-image model that enables novel adaptation schemes. Our experiments show that our data-free scheme applied to the pre-trained CLIP could improve the (provable) adversarial robustness of its diverse zero-shot classification derivatives (while maintaining their accuracy), significantly surpassing prior approaches that utilize the full training data. Not only for CLIP, we also demonstrate that our framework is easily applicable for robustifying other visual classifiers efficiently.
翻译:对抗鲁棒性传统上被认为是神经网络难以编码的特性,需要大量的训练数据。然而,在采用现成模型的最新范式中,获取其训练数据通常不可行或不切实际,而大多数此类模型最初并未针对对抗鲁棒性进行训练。在本文中,我们开发了一种可扩展且模型无关的解决方案,无需使用任何数据即可实现对抗鲁棒性。我们的直觉是将最近的文本到图像扩散模型视为可优化的"可适应"去噪器,可被优化以指定目标任务。基于此,我们提出:(a) 启动一个去噪-分类流程,为对抗攻击提供可证明的保证;(b) 利用文本到图像模型生成的少量合成参考图像,实现新颖的适应方案。我们的实验表明,将这种无数据方案应用于预训练的CLIP模型,可以提升其多样化零样本分类衍生模型的可证明对抗鲁棒性(同时保持其准确性),显著超越了先前利用完整训练数据的方法。不仅限于CLIP,我们还证明了我们的框架能够轻松高效地应用于其他视觉分类器的鲁棒化。