Classifiers based on deep neural networks have been recently challenged by Adversarial Attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.
翻译:基于深度神经网络的分类器近期受到对抗性攻击的挑战,其广泛存在的脆弱性引发了针对潜在威胁的防御研究。针对脆弱分类器,现有防御方法大多为白盒方法,通常需要在修改损失函数或训练机制下对受害模型进行重训练。然而,用户通常无法获取受害模型的数据、模型结构或训练细节,且受限于计算资源等因素,重训练往往不可行或不具吸引力。为此,我们提出一种新的黑箱防御框架,该框架可在几乎不了解模型具体信息的情况下,将任何预训练分类器转化为鲁棒模型。这一目标通过联合贝叶斯处理干净数据、对抗样本及分类器以实现三者联合概率最大化来实现,并配以保持受害模型完整性的新后训练策略。我们将该框架命名为贝叶斯边界校正(BBC)。BBC是一种通用且灵活的框架,可轻松适配不同数据类型。我们针对静态与动态数据,分别将BBC实例化应用于图像分类和基于骨架的人体活动识别。全面评估表明,与现有防御方法相比,BBC具有优越的鲁棒性,且能在不严重损害干净数据准确率的前提下增强鲁棒性。