Robust Principal Component Analysis (RPCA) aims to recover a low-rank structure from noisy, partially observed data that is also corrupted by sparse, potentially large-magnitude outliers. Traditional RPCA models rely on convex relaxations, such as nuclear norm and $\ell_1$ norm, to approximate the rank of a matrix and the $\ell_0$ functional (the number of non-zero elements) of another. In this work, we advocate a nonconvex regularization method, referred to as transformed $\ell_1$ (TL1), to improve both approximations. The rationale is that by varying the internal parameter of TL1, its behavior asymptotically approaches either $\ell_0$ or $\ell_1$. Since the rank is equal to the number of non-zero singular values and the nuclear norm is defined as their sum, applying TL1 to the singular values can approximate either the rank or the nuclear norm, depending on its internal parameter. We conduct a fine-grained theoretical analysis of statistical convergence rates, measured in the Frobenius norm, for both the low-rank and sparse components under general sampling schemes. These rates are comparable to those of the classical RPCA model based on the nuclear norm and $\ell_1$ norm. Moreover, we establish constant-order upper bounds on the estimated rank of the low-rank component and the cardinality of the sparse component in the regime where TL1 behaves like $\ell_0$, assuming that the respective matrices are exactly low-rank and exactly sparse. Extensive numerical experiments on synthetic data and real-world applications demonstrate that the proposed approach achieves higher accuracy than the classic convex model, especially under non-uniform sampling schemes.
翻译:鲁棒主成分分析(RPCA)旨在从受噪声干扰、部分可观测且被稀疏、可能大振幅异常值污染的数据中恢复低秩结构。传统的RPCA模型依赖于凸松弛方法,例如核范数和$\ell_1$范数,来近似矩阵的秩和另一矩阵的$\ell_0$泛函(非零元素个数)。在本工作中,我们提倡一种称为变换$\ell_1$(TL1)的非凸正则化方法,以改进这两种近似。其基本原理在于,通过调整TL1的内部参数,其渐近行为可趋近于$\ell_0$或$\ell_1$。由于秩等于非零奇异值的个数,而核范数定义为它们的和,因此将TL1应用于奇异值可以近似秩或核范数,具体取决于其内部参数。我们对一般采样方案下低秩分量和稀疏分量在Frobenius范数下的统计收敛速率进行了细粒度的理论分析。这些速率与基于核范数和$\ell_1$范数的经典RPCA模型的速率相当。此外,在TL1行为类似$\ell_0$的机制下,假设相应矩阵分别为精确低秩和精确稀疏,我们建立了低秩分量估计秩和稀疏分量基数的常数阶上界。在合成数据和实际应用上的大量数值实验表明,所提出的方法比经典凸模型实现了更高的精度,尤其是在非均匀采样方案下。