Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews a particular subset of this literature that focuses on investigating properties of training data in the context of model robustness under evasion attacks. It first summarizes the main properties of data leading to adversarial vulnerability. It then discusses guidelines and techniques for improving adversarial robustness by enhancing the data representation and learning procedures, as well as techniques for estimating robustness guarantees given particular data. Finally, it discusses gaps of knowledge and promising future research directions in this area.
翻译:对抗样本是攻击者有意设计的、旨在使机器学习模型出错的一类输入。此类样本对基于机器学习的系统应用构成严重威胁,尤其在生命与安全关键领域。为解决此问题,对抗鲁棒性领域深入探究对抗攻击及其防御机制。本综述聚焦该领域中的特定分支,即在规避攻击背景下,重点分析训练数据特性对模型鲁棒性的影响。首先总结导致对抗脆弱性的主要数据属性,继而探讨通过优化数据表征与学习流程来提升对抗鲁棒性的指导原则与技术方法,同时介绍针对特定数据评估鲁棒性保障的方法。最后,指出当前认知空白及该领域具有前景的未来研究方向。