The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation that is highly insensitive to outliers, computationally scalable, and provides accurate approximations of singular vectors. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm to appropriately scaled data, significantly outperforming competing algorithms in computation times. To assess the robustness of the approximated singular vectors and their subspaces against data contamination, we introduce new notions of breakdown points for matrix-valued input, including row-wise, column-wise, and block-wise breakdown points. Theoretical and empirical analyses demonstrate that our algorithm exhibits higher breakdown points compared to standard SVD and its modifications. We empirically validate the effectiveness of our approach in applications such as robust low-rank approximation and robust principal component analysis of high-dimensional microarray datasets. Overall, our study presents a highly efficient and robust solution for SVD approximation that overcomes the limitations of existing algorithms in the presence of outliers.
翻译:奇异值分解(SVD)是机器学习和统计数据分析中的关键工具,但极易受数据矩阵中异常值的影响。现有鲁棒SVD算法为追求鲁棒性往往牺牲速度,或在少量异常值存在时失效。本研究提出一种高效算法——球面归一化SVD,用于实现鲁棒SVD近似,该算法对异常值高度不敏感、计算可扩展性强,并能提供准确的奇异向量近似。所提算法仅需对经过适当缩放的数据执行两次标准降秩SVD算法即可实现卓越速度,在计算时间上显著优于竞争算法。为评估近似奇异向量及其子空间在数据污染下的鲁棒性,我们引入矩阵输入的新颖崩溃点概念,包括行式、列式和块式崩溃点。理论与实证分析表明,相较标准SVD及其改进版本,本算法具有更高的崩溃点。我们通过高维微阵列数据集的鲁棒低秩近似与鲁棒主成分分析等应用验证了该方法的有效性。总体而言,本研究提供了一种在异常值存在时突破现有算法局限性的高效鲁棒SVD近似解决方案。