Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in the development of scalable algorithms for machine learning and statistical data analysis applications. However, RandNLA's true potential only came into focus upon integration with the fields of numerical analysis and "classical" numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that can be every bit as reliable as algorithms that might be found in libraries such as LAPACK. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciPy. For all its success, we believe that RandNLA has yet to realize its full potential. In particular, we believe the scientific community stands to benefit significantly from suitably defined "RandBLAS" and "RandLAPACK" libraries, to serve as standards conceptually analogous to BLAS and LAPACK. This 200-page monograph represents a step toward defining such standards. In it, we cover topics spanning basic sketching, least squares and optimization, low-rank approximation, full matrix decompositions, leverage score sampling, and sketching data with tensor product structures (among others). Much of the provided pseudo-code has been tested via publicly available Matlab and Python implementations.
翻译:随机数值线性代数——简称RandNLA——关注于将随机化作为资源,以开发用于大规模线性代数计算的改进算法。当代RandNLA的起源可追溯至理论计算机科学领域,其源于一个简单理念:随机化为线性代数问题的近似解提供了比确定性算法更高效的计算途径。这一理念在开发适用于机器学习和统计数据分析应用的可扩展算法中展现出巨大潜力。然而,RandNLA的真正价值仅在与数值分析和"经典"数值线性代数领域融合后才得以充分显现。通过众多研究者的努力,现已开发出能够完全控制解精度、且可靠性与LAPACK等标准程序库中的算法不相上下的随机化算法。近年来,部分RandNLA方法甚至已融入MATLAB、NAG程序库、NVIDIA的cuSOLVER以及SciPy中。尽管已取得诸多成就,我们认为RandNLA仍有待发挥其全部潜力。特别是,我们相信科学界将从精心定义的"RandBLAS"和"RandLAPACK"程序库中受益良多,这些程序库可充当与BLAS和LAPACK概念上对应的标准。这份200页的专著正是迈向定义此类标准的一步。书中涵盖的论题包括:基础草图化、最小二乘与优化、低秩近似、全矩阵分解、杠杆值采样、以及具有张量积结构的数据草图化(等)。所提供的伪代码大部分已通过公开可用的Matlab和Python实现进行了验证。