As the scale of problems and data used for experimental design, signal processing and data assimilation grow, the oft-occuring least squares subproblems are correspondingly growing in size. As the scale of these least squares problems creates prohibitive memory movement costs for the usual incremental QR and Krylov-based algorithms, randomized least squares problems are garnering more attention. However, these randomized least squares solvers are difficult to integrate application algorithms as their uncertainty limits practical tracking of algorithmic progress and reliable stopping. Accordingly, in this work, we develop theoretically-rigorous, practical tools for quantifying the uncertainty of an important class of iterative randomized least squares algorithms, which we then use to track algorithmic progress and create a stopping condition. We demonstrate the effectiveness of our algorithm by solving a 0.78 TB least squares subproblem from the inner loop of incremental 4D-Var using only 195 MB of memory.
翻译:随着实验设计、信号处理和数据同化中所用问题与数据的规模日益增大,其中频繁出现的最小二乘子问题的规模也相应增长。由于这些最小二乘问题的规模对常规的增量QR算法和基于Krylov子空间的算法造成了巨大的内存移动成本,随机最小二乘问题正受到越来越多的关注。然而,这些随机最小二乘求解器难以集成到应用算法中,因为它们的不确定性限制了算法进展的实用追踪和可靠终止。为此,本文开发了具有严格理论基础的实用工具,用于量化一类重要的迭代随机最小二乘算法的不确定性,并利用这些工具追踪算法进展并构建终止条件。我们通过在内层循环增量四维变分同化(4D-Var)中仅用195 MB内存求解一个0.78 TB的最小二乘子问题,展示了所提算法的有效性。