Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong contamination model, where a constant fraction of datapoints are arbitrarily corrupted. We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/{\epsilon})$ samples where $D$ is the ambient dimension and $\epsilon$ is the fraction of corrupted datapoints. As a concrete application of our framework, we apply it to the problem of low rank matrix sensing, developing efficient and provably robust algorithms that can tolerate corruptions in both the sensing matrices and the measurements. In addition, we establish a Statistical Query lower bound providing evidence that the quadratic dependence on $D$ in the sample complexity is necessary for computationally efficient algorithms.
翻译:在随机非凸优化中,寻找近似二阶驻点(SOSP)是一个经过充分研究的基础性问题,在机器学习中具有广泛应用。然而,该问题在存在异常值的情况下尚未得到充分理解,这限制了现有非凸算法在对抗性环境中的应用。本文研究了强污染模型下寻找SOSP的问题,其中恒定比例的数据点被任意破坏。我们引入了一个通用框架,能够高效地找到具有维度无关精度保证的近似SOSP,所需样本量为$\widetilde{O}({D^2}/{\epsilon})$,其中$D$是环境维度,$\epsilon$是被破坏数据点的比例。作为该框架的具体应用,我们将其应用于低秩矩阵感知问题,开发了高效且可证明鲁棒的算法,能够容忍感知矩阵和测量中的污染。此外,我们建立了一个统计查询下界,表明样本复杂度中对$D$的二次依赖对于计算高效的算法是必要的。