We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.
翻译:我们研究流式模型下的高维稳健统计任务。近期一系列工作获得了针对多种高维稳健估计任务的计算高效算法。然而,所有先前算法均需存储完整数据集,导致内存开销至少与维度二次相关。本文首次开发了内存需求近乎最优(至多对数因子)的高维稳健统计高效流式算法。主要成果针对(强化版)Huber污染模型中的高维稳健均值估计任务:我们设计了该任务的高效单遍流式算法,其误差保证接近最优,空间复杂度与维度近乎线性相关。作为推论,我们进一步获得了多个复杂任务的近最优空间复杂度流式算法,包括稳健协方差估计、稳健回归,以及更一般性的稳健随机优化。