Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in the development of scalable algorithms for machine learning and statistical data analysis applications. However, RandNLA's true potential only came into focus upon integration with the fields of numerical analysis and "classical" numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that can be every bit as reliable as algorithms that might be found in libraries such as LAPACK. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciKit-Learn. For all its success, we believe that RandNLA has yet to realize its full potential. In particular, we believe the scientific community stands to benefit significantly from suitably defined "RandBLAS" and "RandLAPACK" libraries, to serve as standards conceptually analogous to BLAS and LAPACK. This 200-page monograph represents a step toward defining such standards. In it, we cover topics spanning basic sketching, least squares and optimization, low-rank approximation, full matrix decompositions, leverage score sampling, and sketching data with tensor product structures (among others). Much of the provided pseudo-code has been tested via publicly available MATLAB and Python implementations.
翻译:随机数值线性代数——简称RandNLA——关注如何利用随机化这一资源,为大规模线性代数计算开发改进算法。当代RandNLA的起源可追溯至理论计算机科学,在那里它从一个简单想法蓬勃发展:随机化为线性代数问题的近似解提供了比确定性算法更高效率的求解途径。这一想法在开发面向机器学习与统计数据分析应用的可扩展算法中已得到验证。然而,RandNLA的真正潜力只有在与数值分析及"经典"数值线性代数的领域融合后才得以显现。通过众多研究者的努力,现已开发出能够完全控制解精度、且与LAPACK等库中算法同样可靠的随机算法。近年来,某些RandNLA方法甚至已整合进MATLAB、NAG数值库、NVIDIA cuSOLVER及Scikit-learn中。尽管取得诸多成功,我们认为RandNLA仍有未竟的潜力。具体而言,我们相信科学界将极大受益于适当定义的"RandBLAS"和"RandLAPACK"库——它们可作为类比于BLAS和LAPACK的概念性标准。这部200页的专著正是迈向定义此类标准的一步。书中涵盖的主题包括:基础草图化、最小二乘与优化、低秩近似、全矩阵分解、杠杆值采样、以及基于张量积结构的数据草图化(此外还有其他内容)。提供的伪代码多数已通过公开可用的MATLAB与Python实现完成测试。