Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews a particular subset of this literature that focuses on investigating properties of training data in the context of model robustness under evasion attacks. It first summarizes the main properties of data leading to adversarial vulnerability. It then discusses guidelines and techniques for improving adversarial robustness by enhancing the data representation and learning procedures, as well as techniques for estimating robustness guarantees given particular data. Finally, it discusses gaps of knowledge and promising future research directions in this area.
翻译:对抗样本是攻击者有意设计的机器学习模型输入,旨在误导模型产生错误。这类样本对基于机器学习的系统的应用构成严重威胁,尤其在生命与安全关键领域。为解决此问题,对抗鲁棒性领域研究对抗攻击背后的机制及防御措施。本综述聚焦于该领域中关于逃避攻击下模型鲁棒性训练数据特性的分支。首先总结导致对抗脆弱性的主要数据特性,继而探讨通过优化数据表示与学习流程提升对抗鲁棒性的原则与技术,以及针对特定数据评估鲁棒性保证的方法。最后,论述当前知识空白及该领域具有前景的未来研究方向。