The Johnson--Lindenstrauss (JL) lemma is a powerful tool for dimensionality reduction in modern algorithm design. The lemma states that any set of high-dimensional points in a Euclidean space can be flattened to lower dimensions while approximately preserving pairwise Euclidean distances. Random matrices satisfying this lemma are called JL transforms (JLTs). Inspired by existing $s$-hashing JLTs with exactly $s$ nonzero elements on each column, the present work introduces an ensemble of sparse matrices encompassing so-called $s$-hashing-like matrices whose expected number of nonzero elements on each column is~$s$. The independence of the sub-Gaussian entries of these matrices and the knowledge of their exact distribution play an important role in their analyses. Using properties of independent sub-Gaussian random variables, these matrices are demonstrated to be JLTs, and their smallest and largest singular values are estimated non-asymptotically using a technique from geometric functional analysis. As the dimensions of the matrix grow to infinity, these singular values are proved to converge almost surely to fixed quantities (by using the universal Bai--Yin law), and in distribution to the Gaussian orthogonal ensemble (GOE) Tracy--Widom law after proper rescalings. Understanding the behaviors of extreme singular values is important in general because they are often used to define a measure of stability of matrix algorithms. For example, JLTs were recently used in derivative-free optimization algorithmic frameworks to select random subspaces in which are constructed random models or poll directions to achieve scalability, whence estimating their smallest singular value in particular helps determine the dimension of these subspaces.
翻译:约翰逊-林登斯特劳斯(JL)引理是现代算法设计中强大的降维工具。该引理指出,欧几里得空间中任意高维点集均可投射至低维空间,同时近似保持成对欧氏距离。满足该引理的随机矩阵称为JL变换(JLT)。受现有每列恰有$s$个非零元素的$s$哈希JLT启发,本文引入一类稀疏矩阵集成,涵盖所谓的$s$哈希类矩阵,其每列非零元素期望数为$s$。这些矩阵的子高斯项独立性及其精确分布知识在其分析中起关键作用。利用独立子高斯随机变量的性质,本文证明这些矩阵构成JLT,并采用几何泛函分析技术对其最小与最大奇异值进行非渐近估计。当矩阵维度趋于无穷时,这些奇异值被证明几乎必然收敛于固定量(通过普适的Bai-Yin律),且经适当重标度后分布收敛于高斯正交系综(GOE)的Tracy-Widom律。理解极端奇异值行为具有普遍重要性,因其常用于定义矩阵算法的稳定性度量。例如,JLT近期被用于无导数优化算法框架,以选择构建随机模型或搜索方向的随机子空间实现可扩展性——此时最小奇异值的估计尤其有助于确定这些子空间的维度。