Optimal Embedding Dimension for Sparse Subspace Embeddings

A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O_\theta(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to construct even sparser non-oblivious embeddings, leading to the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.

翻译：设随机$m\times n$矩阵$S$为参数满足$\epsilon>0$，$\delta\in(0,1/3)$，$d\leq m\leq n$的无知子空间嵌入（OSE），若对任意$d$维子空间$W\subseteq R^n$，有$P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta$。已知OSE的嵌入维度需满足$m\geq d$，且对任意$\theta > 0$，满足$m\geq (1+\theta) d$的高斯嵌入矩阵是参数$\epsilon = O_\theta(1)$的OSE。然而对于其他嵌入方法，这种最优嵌入维度尚未明确。特别值得关注的是稀疏OSE，其每列非零元个数$s\ll m$，可应用于最小二乘回归和低秩近似等问题。我们证明：对任意$\theta > 0$，由随机稀疏化$\pm1/\sqrt s$元素构成的$m\times n$随机矩阵$S$，当$m\geq (1+\theta)d$且每列非零元个数$s= O(\log^4(d))$时，该矩阵是参数$\epsilon = O_{\theta}(1)$的无知子空间嵌入。该结果解决了Nelson与Nguyen（FOCS 2013）提出的核心开放问题（他们曾猜测稀疏OSE可实现$m=O(d)$嵌入维度），并改进了Cohen（SODA 2016）证得的$m=O(d\log(d))$。我们利用该构造首次实现了嵌入维度为$O(d)$且应用速度快于当前矩阵乘法时间的无知子空间嵌入，并获得了最优单遍最小二乘回归算法。进一步拓展构造了更稀疏的非无知嵌入，首次实现了低失真$\epsilon=o(1)$与最优嵌入维度$m=O(d/\epsilon^2)$且可在当前矩阵乘法时间内应用的子空间嵌入。