In the $\ell_p$-subspace sketch problem, we are given an $n\times d$ matrix $A$ with $n>d$, and asked to build a small memory data structure $Q(A,\epsilon)$ so that, for any query vector $x\in\mathbb{R}^d$, we can output a number in $(1\pm\epsilon)\|Ax\|_p^p$ given only $Q(A,\epsilon)$. This problem is known to require $\tilde{\Omega}(d\epsilon^{-2})$ bits of memory for $d=\Omega(\log(1/\epsilon))$. However, for $d=o(\log(1/\epsilon))$, no data structure lower bounds were known. We resolve the memory required to solve the $\ell_p$-subspace sketch problem for any constant $d$ and integer $p$, showing that it is $\Omega(\epsilon^{-2(d-1)/(d+2p)})$ bits and $\tilde{O} (\epsilon^{-2(d-1)/(d+2p)})$ words. This shows that one can beat the $\Omega(\epsilon^{-2})$ lower bound, which holds for $d = \Omega(\log(1/\epsilon))$, for any constant $d$. We also show how to implement the upper bound in a single pass stream, with an additional multiplicative $\operatorname{poly}(\log \log n)$ factor and an additive $\operatorname{poly}(\log n)$ cost in the memory. Our bounds can be applied to point queries for SVMs with additive error, yielding an optimal bound of $\tilde{\Theta}(\epsilon^{-2d/(d+3)})$ for every constant $d$. This is a near-quadratic improvement over the $\Omega(\epsilon^{-(d+1)/(d+3)})$ lower bound of (Andoni et al. 2020). Our techniques rely on a novel connection to low dimensional techniques from geometric functional analysis.
翻译:在$\ell_p$-子空间草图问题中,给定一个$n\times d$矩阵$A$(其中$n>d$),要求构建一个小型内存数据结构$Q(A,\epsilon)$,使得对于任意查询向量$x\in\mathbb{R}^d$,仅通过$Q(A,\epsilon)$即可输出$(1\pm\epsilon)\|Ax\|_p^p$范围内的数值。已知该问题在$d=\Omega(\log(1/\epsilon))$时需$\tilde{\Omega}(d\epsilon^{-2})$比特内存。然而,当$d=o(\log(1/\epsilon))$时,尚无数据结构下界。我们解决了任意常数$d$和整数$p$下$\ell_p$-子空间草图问题所需的内存问题,证明其下界为$\Omega(\epsilon^{-2(d-1)/(d+2p)})$比特,上界为$\tilde{O}(\epsilon^{-2(d-1)/(d+2p)})$字。这表明对于任意常数$d$,可突破$d = \Omega(\log(1/\epsilon))$情形下的$\Omega(\epsilon^{-2})$下界。我们还展示了如何通过单遍流实现上界,其内存成本增加$\operatorname{poly}(\log \log n)$乘法因子和$\operatorname{poly}(\log n)$加法因子。将所得界限应用于带加法误差的支持向量机点查询时,对于每个常数$d$可获得最优界$\tilde{\Theta}(\epsilon^{-2d/(d+3)})$。这相较于(Andoni et al. 2020)中$\Omega(\epsilon^{-(d+1)/(d+3)})$的下界实现了近二次改进。我们的技术依赖于与几何泛函分析中小维数方法的新颖联系。