A CUR factorization is often utilized as a substitute for the singular value decomposition (SVD), especially when a concrete interpretation of the singular vectors is challenging. Moreover, if the original data matrix possesses properties like nonnegativity and sparsity, a CUR decomposition can better preserve them compared to the SVD. An essential aspect of this approach is the methodology used for selecting a subset of columns and rows from the original matrix. This study investigates the effectiveness of \emph{one-round sampling} and iterative subselection techniques and introduces new iterative subselection strategies based on iterative SVDs. One provably appropriate technique for index selection in constructing a CUR factorization is the discrete empirical interpolation method (DEIM). Our contribution aims to improve the approximation quality of the DEIM scheme by iteratively invoking it in several rounds, in the sense that we select subsequent columns and rows based on the previously selected ones. That is, we modify $A$ after each iteration by removing the information that has been captured by the previously selected columns and rows. We also discuss how iterative procedures for computing a few singular vectors of large data matrices can be integrated with the new iterative subselection strategies. We present the results of numerical experiments, providing a comparison of one-round sampling and iterative subselection techniques, and demonstrating the improved approximation quality associated with using the latter.
翻译:CUR分解常被用作奇异值分解(SVD)的替代方法,尤其当奇异向量的具体解释存在困难时。此外,若原始数据矩阵具有非负性和稀疏性等特性,CUR分解相比SVD能更好地保留这些特征。该方法的一个关键方面在于从原始矩阵中选择列和行的子集所使用的技术。本研究探讨了“单轮采样”和迭代子选择技术的有效性,并引入了基于迭代SVD的新的迭代子选择策略。在构建CUR分解时,一种理论上适用于索引选择的技术是离散经验插值法(DEIM)。我们的贡献在于通过多轮迭代调用DEIM方案来改善其近似质量,即根据先前选择的列和行来选择后续的列和行。具体而言,我们在每次迭代后通过移除已被先前选择的列和行捕获的信息来修改矩阵$A$。我们还讨论了如何将用于计算大型数据矩阵中少数奇异向量的迭代过程与新的迭代子选择策略相结合。我们展示了数值实验的结果,对单轮采样和迭代子选择技术进行了比较,并证明了使用后者在近似质量上的提升。