Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing Kullback-Leibler divergence recovers the Vecchia approximation for Gaussian processes. These methods rely only on the geometry of the evaluation points to construct the sparsity pattern. In this work, we instead construct the sparsity pattern by leveraging a greedy selection algorithm that maximizes mutual information with target points, conditional on all points previously selected. For selecting $k$ points out of $N$, the naive time complexity is $\mathcal{O}(N k^4)$, but by maintaining a partial Cholesky factor we reduce this to $\mathcal{O}(N k^2)$. Furthermore, for multiple ($m$) targets we achieve a time complexity of $\mathcal{O}(N k^2 + N m^2 + m^3)$, which is maintained in the setting of aggregated Cholesky factorization where a selected point need not condition every target. We apply the selection algorithm to image classification and recovery of sparse Cholesky factors. By minimizing Kullback-Leibler divergence, we apply the algorithm to Cholesky factorization, Gaussian process regression, and preconditioning with the conjugate gradient, improving over $k$-nearest neighbors selection.
翻译:由核函数成对评估产生的稠密核矩阵自然出现在机器学习和统计学中。先前通过最小化Kullback-Leibler散度来构建此类矩阵的稀疏近似逆Cholesky因子,恢复了高斯过程的Vecchia近似。这些方法仅依赖评估点的几何结构来构建稀疏模式。本研究转而利用贪心选择算法构建稀疏模式,该算法在给定所有先前选择的点的条件下,最大化与目标点的互信息。从$N$个点中选择$k$个点时,朴素时间复杂度为$\mathcal{O}(N k^4)$,但通过维护部分Cholesky因子,我们将其降至$\mathcal{O}(N k^2)$。此外,对于多个($m$个)目标,我们实现了$\mathcal{O}(N k^2 + N m^2 + m^3)$的时间复杂度,该复杂度在聚合Cholesky分解(所选点无需条件作用于每个目标)的设置中得以保持。我们将选择算法应用于图像分类和稀疏Cholesky因子恢复。通过最小化Kullback-Leibler散度,我们将该算法应用于Cholesky分解、高斯过程回归以及共轭梯度法预条件处理,优化了$k$近邻选择方法。