Outlier-robust additive matrix decomposition

from arxiv, This paper studies a broader model but shares content with arXiv:2012.06750 (which will not be further revised). Correction of typos, additional simulations, removal of robust matrix completion. Unlike mentioned in arXiv:2012.06750, (2018) Bellec et all DOES achieve the optimal rate for uncorrupted sparse linear regression (but assuming noise independent of features)

We study least-squares trace regression when the parameter is the sum of a $r$-low-rank matrix and a $s$-sparse matrix and a fraction $\epsilon$ of the labels is corrupted. For subgaussian distributions and feature-dependent noise, we highlight three needed design properties, each one derived from a different process inequality: a "product process inequality", "Chevet's inequality" and a "multiplier process inequality". These properties handle, simultaneously, additive decomposition, label contamination and design-noise interaction. They imply the near-optimality of a tractable estimator with respect to the effective dimensions $d_{eff,r}$ and $d_{eff,s}$ of the low-rank and sparse components, $\epsilon$ and the failure probability $\delta$. The near-optimal rate is $\mathsf{r}(n,d_{eff,r}) + \mathsf{r}(n,d_{eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon)$, where $\mathsf{r}(n,d_{eff,r})+\mathsf{r}(n,d_{eff,s})$ is the optimal rate in average with no contamination. Our estimator is adaptive to $(s,r,\epsilon,\delta)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-\delta$ uniformly over all $\delta\ge\exp(-cn)$. Without matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Our estimators are based on "sorted" versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of "sorted" Huber's losses over the classical Huber's loss.

翻译：我们研究当参数为$r$-低秩矩阵与$s$-稀疏矩阵之和，且标签中存在比例为$\epsilon$的损坏时的最小二乘迹回归问题。针对亚高斯分布和特征相关噪声，我们强调三个所需的设计性质，每个性质源自不同的过程不等式：“乘积过程不等式”、“Chevet不等式”和“乘子过程不等式”。这些性质同时处理了加法分解、标签污染以及设计-噪声交互作用。它们表明，基于低秩分量和稀疏分量的有效维度$d_{eff,r}$和$d_{eff,s}$、$\epsilon$以及失效概率$\delta$，一个可计算的估计量具有近乎最优性。近乎最优的速率为$\mathsf{r}(n,d_{eff,r}) + \mathsf{r}(n,d_{eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon)$，其中$\mathsf{r}(n,d_{eff,r})+\mathsf{r}(n,d_{eff,s})$是无污染情况下的平均最优速率。我们的估计量对$(s,r,\epsilon,\delta)$具有自适应性，并且对于固定的绝对常数$c>0$，它在所有$\delta\ge\exp(-cn)$上以概率$1-\delta$一致地达到所述速率。在没有矩阵分解的情况下，我们的分析还为适应噪声方差的鲁棒估计量提供了最优界。我们的估计量基于Huber损失的“排序”版本。我们展示了与理论相匹配的模拟结果，尤其揭示了“排序”Huber损失相对于经典Huber损失的优越性。