Outlier-robust additive matrix decomposition and robust matrix completion

from arxiv, This paper studies a broader model but shares content with arXiv:2012.06750 (which will not be further revised). Correction of typos. Unlike mentioned in arXiv:2012.06750, (2018) Bellec et DOES achieve the optimal rate for uncorrupted sparse linear regression (but assuming noise independent of features)

We study least-squares trace regression when the parameter is the sum of a $r$-low-rank and a $s$-sparse matrices and a fraction $\epsilon$ of the labels is corrupted. For subgaussian distributions, we highlight three design properties. The first, termed $\PP$, handles additive decomposition and follows from a product process inequality. The second, termed $\IP$, handles both label contamination and additive decomposition. It follows from Chevet's inequality. The third, termed $\MP$, handles the interaction between the design and featured-dependent noise. It follows from a multiplier process inequality. Jointly, these properties entail the near-optimality of a tractable estimator with respect to the effective dimensions $d_{\eff,r}$ and $d_{\eff,s}$ for the low-rank and sparse components, $\epsilon$ and the failure probability $\delta$. This rate has the form $$ \mathsf{r}(n,d_{\eff,r}) + \mathsf{r}(n,d_{\eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon). $$ Here, $\mathsf{r}(n,d_{\eff,r})+\mathsf{r}(n,d_{\eff,s})$ is the optimal uncontaminated rate, independent of $\delta$. Our estimator is adaptive to $(s,r,\epsilon,\delta)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-\delta$ uniformly over all $\delta\ge\exp(-cn)$. Disconsidering matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Finally, we consider robust matrix completion. We highlight a new property for this problem: one can robustly and optimally estimate the incomplete matrix regardless of the \emph{magnitude of the corruption}. Our estimators are based on ``sorted'' versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of ``sorted'' Huber loss over the classical Huber's loss.

翻译：我们研究了当参数为 $r$-低秩矩阵与 $s$-稀疏矩阵之和，且标签中 $\epsilon$ 部分被污染时的最小二乘迹回归问题。对于次高斯分布，我们重点阐述了三种设计性质。第一种性质称为 $\PP$，它处理加法分解问题，其推导基于乘积过程不等式。第二种性质称为 $\IP$，同时处理标签污染与加法分解问题，其推导基于Chevet不等式。第三种性质称为 $\MP$，处理设计与特征相关噪声之间的交互作用，其推导基于乘子过程不等式。这些性质共同确保：存在一个可计算的估计量，相对于有效维度 $d_{\eff,r}$（低秩分量）和 $d_{\eff,s}$（稀疏分量）、污染比例 $\epsilon$ 以及失败概率 $\delta$，能达到近最优的收敛速率。该速率具有以下形式：$$ \mathsf{r}(n,d_{\eff,r}) + \mathsf{r}(n,d_{\eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon). $$ 其中 $\mathsf{r}(n,d_{\eff,r})+\mathsf{r}(n,d_{\eff,s})$ 是无污染情况下的最优速率，与 $\delta$ 无关。我们的估计量对参数 $(s,r,\epsilon,\delta)$ 具有自适应性，且对于任意固定绝对常数 $c>0$，它能在概率 $1-\delta$ 下（对所有满足 $\delta\ge\exp(-cn)$ 的情况一致）达到上述速率。在不考虑矩阵分解的情况下，我们的分析还给出了适用于噪声方差的自适应鲁棒估计量的最优界。最后，我们研究了鲁棒矩阵补全问题。针对该问题，我们提出一个新性质：无论污染程度如何，都能鲁棒且最优地估计不完整矩阵。我们的估计量基于Huber损失的“排序”版本。数值模拟结果与理论分析一致，特别地，模拟表明“排序”Huber损失优于经典Huber损失。