Column selection is an essential tool for structure-preserving low-rank approximation, with wide-ranging applications across many fields, such as data science, machine learning, and theoretical chemistry. In this work, we develop unified methodologies for fast, efficient, and theoretically guaranteed column selection. First we derive and implement a sparsity-exploiting deterministic algorithm applicable to tasks including kernel approximation and CUR decomposition. Next, we develop a matrix-free formalism relying on a randomization scheme satisfying guaranteed concentration bounds, applying this construction both to CUR decomposition and to the approximation of matrix functions of graph Laplacians. Importantly, the randomization is only relevant for the computation of the scores that we use for column selection, not the selection itself given these scores. For both deterministic and matrix-free algorithms, we bound the performance favorably relative to the expected performance of determinantal point process (DPP) sampling and, in select scenarios, that of exactly optimal subset selection. The general case requires new analysis of the DPP expectation. Finally, we demonstrate strong real-world performance of our algorithms on a diverse set of example approximation tasks.
翻译:列选择是结构保持低秩近似的一种基本工具,在数据科学、机器学习和理论化学等诸多领域具有广泛应用。在本工作中,我们开发了快速、高效且具有理论保证的列选择的统一方法。首先,我们推导并实现了一种利用稀疏性的确定性算法,适用于包括核近似和CUR分解在内的任务。其次,我们发展了一种基于满足有界集中性保证的随机化方案的免矩阵形式体系,并将此构造应用于CUR分解以及图拉普拉斯矩阵函数的近似。重要的是,随机化仅与我们用于列选择的分数计算相关,而非给定这些分数后的选择过程本身。对于确定性和免矩阵算法,我们均给出了相对于行列式点过程(DPP)采样期望性能的有利性能界,并在特定场景下,相对于精确最优子集选择的性能界。一般情况需要对DPP期望进行新的分析。最后,我们在多种示例近似任务上展示了我们算法的强大实际性能。