Despite extraordinary progress, current machine learning systems have been shown to be brittle against adversarial examples: seemingly innocuous but carefully crafted perturbations of test examples that cause machine learning predictors to misclassify. Can we learn predictors robust to adversarial examples? and how? There has been much empirical interest in this contemporary challenge in machine learning, and in this thesis, we address it from a theoretical perspective. In this thesis, we explore what robustness properties can we hope to guarantee against adversarial examples and develop an understanding of how to algorithmically guarantee them. We illustrate the need to go beyond traditional approaches and principles such as empirical risk minimization and uniform convergence, and make contributions that can be categorized as follows: (1) introducing problem formulations capturing aspects of emerging practical challenges in robust learning, (2) designing new learning algorithms with provable robustness guarantees, and (3) characterizing the complexity of robust learning and fundamental limitations on the performance of any algorithm.
翻译:尽管取得了非凡的进展,当前的机器学习系统已被证明对对抗样本是脆弱的:看似无害但精心设计的测试示例扰动会导致机器学习预测器错误分类。我们能否学习出对对抗样本鲁棒的预测器?以及如何实现?对于这一当代机器学习挑战,已有大量实证研究兴趣,而在本论文中,我们从理论视角加以探讨。在本论文中,我们探索了针对对抗样本可以期望保证哪些鲁棒性属性,并深入理解如何从算法上保障这些属性。我们阐明了超越传统方法与原则(如经验风险最小化和一致收敛)的必要性,并做出了可归纳为以下三类的贡献:(1) 引入捕获鲁棒学习中新兴实际挑战的问题形式化;(2) 设计具有可证明鲁棒性保证的新型学习算法;(3) 刻画鲁棒学习的复杂性以及任何算法在性能上的根本局限性。