We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $\rho$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$ for all orthogonal projections $\Pi \in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(\Pi) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + \rho + \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$ when $P$ has bounded covariance. This guarantee holds uniformly in $k$ and is minimax optimal up to the sub-optimality of the plug-in estimator when $\rho = \varepsilon = 0$. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.
翻译:我们考虑在对抗性环境中进行学习,其中分布 $P$ 的 $\varepsilon$ 比例样本被任意修改(全局扰动),其余扰动的平均幅度以 $\rho$ 为界(局部扰动)。给定 $n$ 个此类扰动样本,我们寻求一种计算高效的估计器 $\hat{P}_n$,以最小化 Wasserstein 距离 $\mathsf{W}_1(\hat{P}_n,P)$。实际上,我们处理更细粒度的任务:对于所有正交投影 $\Pi \in \mathbb{R}^{d \times d}$,最小化 $\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$,且性能随 $\mathrm{rank}(\Pi) = k$ 缩放。这使得我们能够同时处理均值估计($k=1$)、分布估计($k=d$)以及介于这两者之间的中间情况。我们刻画了该任务在总体极限下的最优风险,并开发了一种高效的有限样本算法,当 $P$ 具有有界协方差时,其误差界为 $\sqrt{\varepsilon k} + \rho + \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$。该保证在 $k$ 上一致成立,且在 $\rho = \varepsilon = 0$ 时,除插件估计器的次优性外,是最小极大最优的。我们的高效算法依赖于一种理想但难以处理的 2-Wasserstein 投影估计器的新型迹范数近似。我们将此算法应用于鲁棒随机优化,并在此过程中发现了一种克服 Wasserstein 分布鲁棒优化中维度灾难的新方法。