The Johnson--Lindenstrauss (JL) lemma is a powerful tool for dimensionality reduction in modern algorithm design. The lemma states that any set of high-dimensional points in a Euclidean space can be flattened to lower dimensions while approximately preserving pairwise Euclidean distances. Random matrices satisfying this lemma are called JL transforms (JLTs). Inspired by existing $s$-hashing JLTs with exactly $s$ nonzero elements on each column, the present work introduces an ensemble of sparse matrices encompassing so-called $s$-hashing-like matrices whose expected number of nonzero elements on each column is~$s$. The independence of the sub-Gaussian entries of these matrices and the knowledge of their exact distribution play an important role in their analyses. Using properties of independent sub-Gaussian random variables, these matrices are demonstrated to be JLTs, and their smallest and largest singular values are estimated non-asymptotically using a technique from geometric functional analysis. As the dimensions of the matrix grow to infinity, these singular values are proved to converge almost surely to fixed quantities (by using the universal Bai--Yin law), and in distribution to the Gaussian orthogonal ensemble (GOE) Tracy--Widom law after proper rescalings. Understanding the behaviors of extreme singular values is important in general because they are often used to define a measure of stability of matrix algorithms. For example, JLTs were recently used in derivative-free optimization algorithmic frameworks to select random subspaces in which are constructed random models or poll directions to achieve scalability, whence estimating their smallest singular value in particular helps determine the dimension of these subspaces.
翻译:Johnson--Lindenstrauss (JL) 引理是现代算法设计中用于降维的有力工具。该引理指出,欧几里得空间中的任何高维点集都可以被压缩到较低维度,同时近似保持点对之间的欧几里得距离。满足该引理的随机矩阵被称为JL变换 (JLTs)。受现有每列恰好有 $s$ 个非零元素的 $s$-哈希 JLTs 的启发,本文引入了一类稀疏矩阵集合,其中包含所谓的类 $s$-哈希矩阵,其每列非零元素的期望数量为~$s$。这些矩阵中次高斯分量的独立性及其精确分布的知识在分析中起着重要作用。利用独立次高斯随机变量的性质,证明了这些矩阵是 JLTs,并运用几何泛函分析中的技术非渐近地估计了其最小和最大奇异值。随着矩阵维度趋于无穷大,这些奇异值被证明几乎必然收敛于固定值(通过通用的Bai--Yin定律),并且在经过适当缩放后,其分布收敛于高斯正交系综 (GOE) 的Tracy–Widom定律。理解极端奇异值的行为通常很重要,因为它们常被用来定义矩阵算法稳定性的度量。例如,JLTs 最近被用于无导数优化算法框架中,以选择随机子空间,并在其中构建随机模型或搜索方向以实现可扩展性,因此估计其最小奇异值尤其有助于确定这些子空间的维度。