Optimal Embedding Dimension for Sparse Subspace Embeddings

A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O_\theta(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to Leverage Score Sparsification (LESS), which is a recently introduced non-oblivious embedding technique. We use LESS to construct the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.

翻译：一个随机 $m\times n$ 矩阵 $S$ 若满足以下条件，则称为参数为 $\epsilon>0$、$\delta\in(0,1/3)$ 和 $d\leq m\leq n$ 的 oblivious 子空间嵌入（OSE）：对于任意 $d$ 维子空间 $W\subseteq R^n$，有 $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ 已知 OSE 的嵌入维度必须满足 $m\geq d$，并且对于任意 $\theta > 0$，具有 $m\geq (1+\theta) d$ 的高斯嵌入矩阵是一个 $\epsilon = O_\theta(1)$ 的 OSE。然而，对于其他嵌入方式，这种最优嵌入维度尚不明确。特别令人关注的是稀疏 OSE，其每列仅有 $s\ll m$ 个非零元素，可应用于最小二乘回归和低秩逼近等问题。我们证明，给定任意 $\theta > 0$，一个 $m\times n$ 随机矩阵 $S$，若满足 $m\geq (1+\theta)d$，其元素由随机稀疏化的 $\pm1/\sqrt s$ 项构成且每列有 $s= O(\log^4(d))$ 个非零元，则该矩阵是一个 $\epsilon = O_{\theta}(1)$ 的 oblivious 子空间嵌入。我们的结果解决了 Nelson 和 Nguyen（FOCS 2013）提出的主要开放性问题，他们猜想稀疏 OSE 可以实现 $m=O(d)$ 的嵌入维度，并且改进了 Cohen（SODA 2016）所证明的 $m=O(d\log(d))$。利用此结果，我们构造了第一个嵌入维度为 $O(d)$ 且应用速度快于当前矩阵乘法时间的 oblivious 子空间嵌入，并得到了最小二乘回归的最优单遍算法。我们进一步将结果推广到 Leverage Score Sparsification（LESS），这是一种最近引入的非 oblivious 嵌入技术。利用 LESS，我们构造了第一个具有低失真 $\epsilon=o(1)$ 和最优嵌入维度 $m=O(d/\epsilon^2)$ 的子空间嵌入，且该嵌入可在当前矩阵乘法时间内应用。