This paper studies the use of kernel density estimation (KDE) for linear algebraic tasks involving the kernel matrix of a collection of $n$ data points in $\mathbb R^d$. In particular, we improve upon existing algorithms for computing the following up to $(1+\varepsilon)$ relative error: matrix-vector products, matrix-matrix products, the spectral norm, and sum of all entries. The runtimes of our algorithms depend on the dimension $d$, the number of points $n$, and the target error $\varepsilon$. Importantly, the dependence on $n$ in each case is far lower when accessing the kernel matrix through KDE queries as opposed to reading individual entries. Our improvements over existing best algorithms (particularly those of Backurs, Indyk, Musco, and Wagner '21) for these tasks reduce the polynomial dependence on $\varepsilon$, and additionally decreases the dependence on $n$ in the case of computing the sum of all entries of the kernel matrix. We complement our upper bounds with several lower bounds for related problems, which provide (conditional) quadratic time hardness results and additionally hint at the limits of KDE based approaches for the problems we study.
翻译:本文研究了利用核密度估计(KDE)处理涉及 $\mathbb R^d$ 空间中 $n$ 个数据点核矩阵的线性代数任务。具体而言,我们改进了现有算法,以计算以下量至 $(1+\varepsilon)$ 相对误差:矩阵-向量乘积、矩阵-矩阵乘积、谱范数以及所有元素之和。我们算法的运行时间取决于维度 $d$、点数 $n$ 以及目标误差 $\varepsilon$。重要的是,在每种情况下,通过 KDE 查询访问核矩阵时对 $n$ 的依赖程度远低于直接读取单个矩阵元素。相较于现有最佳算法(特别是 Backurs、Indyk、Musco 和 Wagner '21 的算法),我们在这些任务上的改进降低了对 $\varepsilon$ 的多项式依赖,并且在计算核矩阵所有元素之和时进一步降低了对 $n$ 的依赖。我们通过多个相关问题的下界来补充我们的上界,这些下界提供了(条件性)二次时间困难性结果,并暗示了基于 KDE 的方法在我们研究问题中的局限性。