We present an extension of an adaptive, partially matrix-free, Hierarchically Semi-Separable (HSS) matrix construction algorithm by Gorman et al. [SIAM J. Sci. Comput. 41(5), 2019] which uses Gaussian sketching operators to a broader class of Johnson--Lindenstrauss (JL) sketching operators. We develop theoretical work which justifies this extension. In particular, we extend the earlier concentration bounds to all JL sketching operators and examine this bound for specific classes of such operators including the original Gaussian sketching operators, subsampled randomized Hadamard transform (SRHT) and the sparse Johnson--Lindenstrauss transform (SJLT). We discuss the implementation details of applying SJLT and SRHT efficiently. Then we demonstrate experimentally that using SJLT or SRHT instead of Gaussian sketching operators leads to up to 2.5x speedups of the serial HSS construction implementation in the STRUMPACK C++ library. Additionally, we discuss the implementation of a parallel distributed HSS construction that leverages Gaussian or SJLT sketching operators. We observe a performance improvement of up to 35x when using SJLT sketching operators over Gaussian sketching operators. The generalized algorithm allows users to select their own JL sketching operators with theoretical lower bounds on the size of the operators which may lead to faster run time with similar HSS construction accuracy.
翻译:本文对Gorman等人[SIAM J. Sci. Comput. 41(5), 2019]提出的自适应、部分免矩阵的层次半可分矩阵构造算法进行了扩展,将原算法中采用的高斯素描算子推广至更广泛的Johnson-Lindenstrauss素描算子类别。我们建立了支持该扩展的理论框架,特别将原有的集中界推广至所有JL素描算子,并针对包括原始高斯素描算子、子采样随机哈达玛变换和稀疏Johnson-Lindenstrauss变换在内的具体算子类别分析了该界限。我们深入探讨了高效应用SJLT与SRHT的实现细节。实验表明,在STRUMPACK C++库的串行HSS构造实现中,采用SJLT或SRHT替代高斯素描算子可获得最高2.5倍的加速比。此外,我们讨论了利用高斯或SJLT素描算子实现并行分布式HSS构造的方案,实验观察到采用SJLT素描算子相比高斯素描算子可获得最高35倍的性能提升。该广义算法允许用户根据理论给出的算子尺寸下界自主选择JL素描算子,在保持相近HSS构造精度的同时可能获得更快的运行速度。