We study least-squares trace regression when the parameter is the sum of a $r$-low-rank and a $s$-sparse matrices and a fraction $\epsilon$ of the labels is corrupted. For subgaussian distributions, we highlight three design properties. The first, termed $\PP$, handles additive decomposition and follows from a product process inequality. The second, termed $\IP$, handles both label contamination and additive decomposition. It follows from Chevet's inequality. The third, termed $\MP$, handles the interaction between the design and featured-dependent noise. It follows from a multiplier process inequality. Jointly, these properties entail the near-optimality of a tractable estimator with respect to the effective dimensions $d_{\eff,r}$ and $d_{\eff,s}$ for the low-rank and sparse components, $\epsilon$ and the failure probability $\delta$. This rate has the form $$ \mathsf{r}(n,d_{\eff,r}) + \mathsf{r}(n,d_{\eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon). $$ Here, $\mathsf{r}(n,d_{\eff,r})+\mathsf{r}(n,d_{\eff,s})$ is the optimal uncontaminated rate, independent of $\delta$. Our estimator is adaptive to $(s,r,\epsilon,\delta)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-\delta$ uniformly over all $\delta\ge\exp(-cn)$. Disconsidering matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Finally, we consider robust matrix completion. We highlight a new property for this problem: one can robustly and optimally estimate the incomplete matrix regardless of the \emph{magnitude of the corruption}. Our estimators are based on ``sorted'' versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of ``sorted'' Huber loss over the classical Huber's loss.
翻译:我们研究了当参数为 $r$-低秩矩阵与 $s$-稀疏矩阵之和,且标签中 $\epsilon$ 部分被污染时的最小二乘迹回归问题。对于次高斯分布,我们重点阐述了三种设计性质。第一种性质称为 $\PP$,它处理加法分解问题,其推导基于乘积过程不等式。第二种性质称为 $\IP$,同时处理标签污染与加法分解问题,其推导基于Chevet不等式。第三种性质称为 $\MP$,处理设计与特征相关噪声之间的交互作用,其推导基于乘子过程不等式。这些性质共同确保:存在一个可计算的估计量,相对于有效维度 $d_{\eff,r}$(低秩分量)和 $d_{\eff,s}$(稀疏分量)、污染比例 $\epsilon$ 以及失败概率 $\delta$,能达到近最优的收敛速率。该速率具有以下形式:$$ \mathsf{r}(n,d_{\eff,r}) + \mathsf{r}(n,d_{\eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon). $$ 其中 $\mathsf{r}(n,d_{\eff,r})+\mathsf{r}(n,d_{\eff,s})$ 是无污染情况下的最优速率,与 $\delta$ 无关。我们的估计量对参数 $(s,r,\epsilon,\delta)$ 具有自适应性,且对于任意固定绝对常数 $c>0$,它能在概率 $1-\delta$ 下(对所有满足 $\delta\ge\exp(-cn)$ 的情况一致)达到上述速率。在不考虑矩阵分解的情况下,我们的分析还给出了适用于噪声方差的自适应鲁棒估计量的最优界。最后,我们研究了鲁棒矩阵补全问题。针对该问题,我们提出一个新性质:无论污染程度如何,都能鲁棒且最优地估计不完整矩阵。我们的估计量基于Huber损失的“排序”版本。数值模拟结果与理论分析一致,特别地,模拟表明“排序”Huber损失优于经典Huber损失。