With more and more deep neural networks being deployed as various daily services, their reliability is essential. It is frightening that deep neural networks are vulnerable and sensitive to adversarial attacks, the most common one of which for the services is evasion-based. Recent works usually strengthen the robustness by adversarial training or leveraging the knowledge of an amount of clean data. However, retraining and redeploying the model need a large computational budget, leading to heavy losses to the online service. In addition, when training, it is likely that only limited adversarial examples are available for the service provider, while much clean data may not be accessible. Based on the analysis on the defense for deployed models, we find that how to rapidly defend against a certain attack for a frozen original service model with limitations of few clean and adversarial examples, which is named as RaPiD (Rapid Plug-in Defender), is really important. Motivated by the generalization and the universal computation ability of pre-trained transformer models, we come up with a new defender method, CeTaD, which stands for Considering Pretrained Transformers as Defenders. In particular, we evaluate the effectiveness and the transferability of CeTaD in the case of one-shot adversarial examples and explore the impact of different parts of CeTaD as well as training data conditions. CeTaD is flexible for different differentiable service models, and suitable for various types of attacks.
翻译:随着越来越多的深度神经网络被部署为各类日常服务,其可靠性至关重要。令人担忧的是,深度神经网络易受对抗攻击的脆弱性和敏感性,其中针对服务的最常见攻击类型是基于逃避的攻击。近期研究通常通过对抗训练或利用大量干净数据的知识来增强鲁棒性。然而,重新训练和重新部署模型需要巨大的计算预算,给在线服务带来重大损失。此外,在训练过程中,服务提供商可能仅能获取有限的对抗样本,同时大量干净数据可能难以获得。基于对已部署模型防御的分析,我们发现如何在冻结的原始服务模型中,利用少量干净样本和对抗样本快速抵御特定攻击(称为RaPiD,即快速即插式防御器)至关重要。受预训练Transformer模型的泛化能力和通用计算能力的启发,我们提出一种新的防御方法CeTaD(即考虑将预训练Transformer作为防御器)。具体而言,我们在单次对抗样本场景下评估了CeTaD的有效性和可迁移性,并探索了CeTaD不同组成部分及训练数据条件的影响。CeTaD适用于不同的可微服务模型,并能适配多种攻击类型。