Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new post-train black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact, avoiding re-training. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.
翻译:基于深度神经网络的分类器易受对抗攻击,这种广泛存在的脆弱性引发了针对潜在威胁的防御研究。对于给定的脆弱分类器,现有防御方法多为白盒方法,通常需要在修改损失函数/训练机制的情况下重新训练受害模型。然而用户通常无法获取受害模型的具体模型结构/数据/训练细节,且由于计算资源有限等原因,重新训练既缺乏吸引力,甚至往往不可行。为此,我们提出一种新的后训练黑盒防御框架。该框架能够在几乎不了解模型细节的情况下,将任意预训练分类器转化为具有抗扰能力的模型。其核心是通过对干净数据、对抗样本及分类器进行新颖的联合贝叶斯处理,以最大化它们的联合概率。该框架进一步配备了一种保持受害模型完整性的后训练策略,从而避免重新训练。我们将此框架命名为贝叶斯边界修正(BBC)。BBC是一个通用且灵活的框架,能够轻松适配不同数据类型。我们针对静态与动态数据,分别实现了面向图像分类和基于骨架的人体动作识别的BBC实例。大量实验表明,与现有防御方法相比,BBC具有卓越的鲁棒性,能够在不过度损害干净数据准确率的前提下有效提升模型鲁棒性。