Sample Complexity of Robust Learning against Evasion Attacks

It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. One of the fundamental problems in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks, where data is corrupted at test time. In this thesis, we work with the exact-in-the-ball notion of robustness and study the feasibility of adversarially robust learning from the perspective of learning theory, considering sample complexity. We first explore the setting where the learner has access to random examples only, and show that distributional assumptions are essential. We then focus on learning problems with distributions on the input data that satisfy a Lipschitz condition and show that robustly learning monotone conjunctions has sample complexity at least exponential in the adversary's budget (the maximum number of bits it can perturb on each input). However, if the adversary is restricted to perturbing $O(\log n)$ bits, then one can robustly learn conjunctions and decision lists w.r.t. log-Lipschitz distributions. We then study learning models where the learner is given more power. We first consider local membership queries, where the learner can query the label of points near the training sample. We show that, under the uniform distribution, the exponential dependence on the adversary's budget to robustly learn conjunctions remains inevitable. We then introduce a local equivalence query oracle, which returns whether the hypothesis and target concept agree in a given region around a point in the training sample, and a counterexample if it exists. We show that if the query radius is equal to the adversary's budget, we can develop robust empirical risk minimization algorithms in the distribution-free setting. We give general query complexity upper and lower bounds, as well as for concrete concept classes.

翻译：理解机器学习模型对对抗攻击的脆弱性正变得日益重要。对抗性机器学习的基本问题之一是量化在存在测试时数据被破坏的规避攻击时所需的训练数据量。本文基于“球内精确”的鲁棒性定义，从学习理论的角度研究对抗鲁棒学习的可行性，重点关注样本复杂度。我们首先探讨学习者仅能获取随机样本的场景，并证明分布假设至关重要。随后，针对输入数据分布满足Lipschitz条件的学习问题，我们证明鲁棒学习单调合取式的样本复杂度关于攻击者预算（其在每个输入上可扰动的最大比特数）至少为指数级。然而，若攻击者仅能扰动$O(\log n)$比特，则可针对log-Lipschitz分布鲁棒地学习合取式和决策表。接着研究赋予学习者更强能力的学习模型。我们首先考虑局部成员查询，即学习者可查询训练样本邻近点的标签。我们证明在均匀分布下，鲁棒学习合取式对攻击者预算的指数依赖仍然不可避免。随后引入局部等价查询预言机——该预言机返回假设与目标概念在训练样本某点给定邻域内是否一致，若存在反例则返回该反例。我们证明当查询半径等于攻击者预算时，可在无分布假设下开发鲁棒经验风险最小化算法。我们给出了通用的查询复杂度上下界，以及针对具体概念类别的结果。