An oblivious subspace embedding is a random $m\times n$ matrix $\Pi$ such that, for any $d$-dimensional subspace, with high probability $\Pi$ preserves the norms of all vectors in that subspace within a $1\pm\epsilon$ factor. In this work, we give an oblivious subspace embedding with the optimal dimension $m=\Theta(d/\epsilon^2)$ that has a near-optimal sparsity of $\tilde O(1/\epsilon)$ non-zero entries per column of $\Pi$. This is the first result to nearly match the conjecture of Nelson and Nguyen [FOCS 2013] in terms of the best sparsity attainable by an optimal oblivious subspace embedding, improving on a prior bound of $\tilde O(1/\epsilon^6)$ non-zeros per column [Chenakkod et al., STOC 2024]. We further extend our approach to the non-oblivious setting, proposing a new family of Leverage Score Sparsified embeddings with Independent Columns, which yield faster runtimes for matrix approximation and regression tasks. In our analysis, we develop a new method which uses a decoupling argument together with the cumulant method for bounding the edge universality error of isotropic random matrices. To achieve near-optimal sparsity, we combine this general-purpose approach with new traces inequalities that leverage the specific structure of our subspace embedding construction.
翻译:无意识子空间嵌入是一种随机$m\times n$矩阵$\Pi$,使得对于任意$d$维子空间,$\Pi$以高概率将该子空间中所有向量的范数保持在$1\pm\epsilon$因子内。本文提出了一种具有最优维度$m=\Theta(d/\epsilon^2)$的无意识子空间嵌入,其稀疏度达到近优水平,即$\Pi$每列仅含$\tilde O(1/\epsilon)$个非零元。该成果首次近乎匹配了Nelson与Nguyen[FOCS 2013]关于最优无意识子空间嵌入可达到最佳稀疏度的猜想,将先前每列$\tilde O(1/\epsilon^6)$非零元的界限[Cheng等, STOC 2024]显著改进。我们进一步将方法拓展至非无意识场景,提出了一类新型的"独立列杠杆得分稀疏化嵌入",为矩阵逼近与回归任务提供了更快的运行时。在分析中,我们开发了一种结合解耦论证与累积量方法的新技术,用于界定各向同性随机矩阵的边缘普适性误差。为实现近优稀疏度,我们将此通用方法与新的迹不等式相结合,该不等式充分利用了我们子空间嵌入构造的特定结构。