Adversarial phenomenon has been widely observed in machine learning (ML) systems, especially in those using deep neural networks, describing that ML systems may produce inconsistent and incomprehensible predictions with humans at some particular cases. This phenomenon poses a serious security threat to the practical application of ML systems, and several advanced attack paradigms have been developed to explore it, mainly including backdoor attacks, weight attacks, and adversarial examples. For each individual attack paradigm, various defense paradigms have been developed to improve the model robustness against the corresponding attack paradigm. However, due to the independence and diversity of these defense paradigms, it is difficult to examine the overall robustness of an ML system against different kinds of attacks.This survey aims to build a systematic review of all existing defense paradigms from a unified perspective. Specifically, from the life-cycle perspective, we factorize a complete machine learning system into five stages, including pre-training, training, post-training, deployment, and inference stages, respectively. Then, we present a clear taxonomy to categorize and review representative defense methods at each individual stage. The unified perspective and presented taxonomies not only facilitate the analysis of the mechanism of each defense paradigm but also help us to understand connections and differences among different defense paradigms, which may inspire future research to develop more advanced, comprehensive defenses.
翻译:对抗现象已在机器学习系统中被广泛观测到,尤其是在使用深度神经网络的系统中,该现象描述的是机器学习系统在某些特定情况下可能产生与人类不一致且难以理解的预测。这一现象对机器学习系统的实际应用构成了严重的安全威胁,目前已发展出多种高级攻击范式来探索该现象,主要包括后门攻击、权重攻击和对抗样本。针对每种攻击范式,研究者已开发出相应的防御范式以提高模型对该攻击范式的鲁棒性。然而,由于这些防御范式相互独立且形式多样,难以评估机器学习系统针对不同攻击类型的整体鲁棒性。本综述旨在从统一视角对所有现有防御范式进行系统梳理。具体而言,我们从生命周期视角出发,将完整的机器学习系统分解为五个阶段,即预训练阶段、训练阶段、后训练阶段、部署阶段和推理阶段。随后,我们提出清晰的分类体系,对每个阶段的代表性防御方法进行归类与评述。这一统一视角与所提出的分类体系不仅有助于分析各防御范式的作用机理,还能帮助我们理解不同防御范式间的联系与差异,从而为未来研究开发更先进、更全面的防御机制提供启示。