A CUR factorization is often utilized as a substitute for the singular value decomposition (SVD), especially when a concrete interpretation of the singular vectors is challenging. Moreover, if the original data matrix possesses properties like nonnegativity and sparsity, a CUR decomposition can better preserve them compared to the SVD. An essential aspect of this approach is the methodology used for selecting a subset of columns and rows from the original matrix. This study investigates the effectiveness of \emph{one-round sampling} and iterative subselection techniques and introduces new iterative subselection strategies based on iterative SVDs. One provably appropriate technique for index selection in constructing a CUR factorization is the discrete empirical interpolation method (DEIM). Our contribution aims to improve the approximation quality of the DEIM scheme by iteratively invoking it in several rounds, in the sense that we select subsequent columns and rows based on the previously selected ones. Thus, we modify $A$ after each iteration by removing the information that has been captured by the previously selected columns and rows. We also discuss how iterative procedures for computing a few singular vectors of large data matrices can be integrated with the new iterative subselection strategies. We present the results of numerical experiments, providing a comparison of one-round sampling and iterative subselection techniques, and demonstrating the improved approximation quality associated with using the latter.
翻译:CUR分解常被用作奇异值分解(SVD)的替代方案,尤其是在奇异向量的具体解释较为困难时。此外,若原始数据矩阵具有非负性和稀疏性等特性,CUR分解相比SVD能更好地保持这些特性。该方法的关键在于从原始矩阵中选择列与行子集的具体策略。本研究探讨了单轮采样与迭代子选择技术的有效性,并提出了基于迭代SVD的新型迭代子选择策略。离散经验插值法(DEIM)是一种可证明适用于构建CUR分解的索引选择技术。我们的贡献在于通过多轮迭代调用DEIM方案来提升其近似质量——即基于已选列与行来选取后续的列与行。为此,我们在每轮迭代后修正矩阵$A$,剔除已被先前所选列与行捕获的信息。同时探讨了将大规模数据矩阵的若干奇异向量迭代计算过程与新型迭代子选择策略相结合的方法。通过数值实验结果的展示,我们对比了单轮采样与迭代子选择技术,并验证了后者在提升近似质量方面的优势。