Classifiers based on deep neural networks have been recently challenged by Adversarial Attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.
翻译:基于深度神经网络的分类器近期受到对抗攻击的挑战,其广泛存在的脆弱性催生了针对潜在威胁的防御研究。针对脆弱分类器,现有防御方法多为白盒方案,通常需要在修改后的损失函数/训练范式下重新训练受害模型。然而,由于用户通常无法获取受害模型的模型结构、训练数据及训练细节等具体信息,加之计算资源受限等因素,重新训练缺乏可行性。为此,本文提出一种新型黑盒防御框架,可在几乎不依赖模型具体信息的情况下将任意预训练分类器转化为鲁棒分类器。该框架通过对干净数据、对抗样本及分类器进行联合贝叶斯建模以最大化其联合概率,并配备保持受害模型完整性的新型后训练策略。我们将该框架命名为贝叶斯边界修正(BBC)。BBC作为通用且灵活的框架,可便捷适配不同数据类型。我们在静态与动态数据场景下,将该框架应用于图像分类和基于骨架的人体行为识别任务。全面评估表明,与现有防御方法相比,BBC具有更优的鲁棒性,且能在不显著损害干净数据准确率的前提下提升模型鲁棒性。