Robust Distribution Learning with Local and Global Adversarial Corruptions

We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (*global* corruptions) and the remaining perturbations have average magnitude bounded by $\rho$ (*local* corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$ for all orthogonal projections $\Pi \in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(\Pi) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + \rho + d^{O(1)}\tilde{O}(n^{-1/k})$ when $P$ has bounded moments of order $2+\delta$, for constant $\delta > 0$. For data distributions with bounded covariance, our finite-sample bounds match the minimax population-level optimum for large sample sizes. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.

翻译：我们考虑在对抗性环境中进行学习，其中来自分布$P$的样本中有$\varepsilon$比例被任意修改（*全局*扰动），而其余扰动的平均幅度以$\rho$为界（*局部*扰动）。给定$n$个此类扰动样本的访问权限，我们寻求一种计算高效的估计器$\hat{P}_n$，以最小化Wasserstein距离$\mathsf{W}_1(\hat{P}_n,P)$。实际上，我们致力于实现更细粒度的目标：对于所有正交投影$\Pi \in \mathbb{R}^{d \times d}$，最小化$\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$，且其性能随$\mathrm{rank}(\Pi) = k$缩放。这使得我们能够同时处理均值估计（$k=1$）、分布估计（$k=d$）以及介于这两种极端情况之间的设定。我们刻画了该任务在总体极限下的最优风险，随后针对具有$2+\delta$阶有界矩（常数$\delta > 0$）的分布$P$，提出了一种高效的有限样本算法，其误差上界为$\sqrt{\varepsilon k} + \rho + d^{O(1)}\tilde{O}(n^{-1/k})$。对于具有有界协方差的数据分布，当样本量较大时，我们的有限样本界与极小极大总体最优值相匹配。我们的高效算法依赖于对理想但难以计算的2-Wasserstein投影估计器的一种新颖的迹范数近似。我们将此算法应用于鲁棒随机优化，并在此过程中发现了一种克服Wasserstein分布鲁棒优化中维度灾难的新方法。