Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems that may be underdetermined, inconsistent, or both. In such cases, one generally seeks to compute the least squares solution that minimizes the residual of the problem, which can be further defined as the solution with smallest norm in cases where the coefficient matrix has a nontrivial nullspace. This work presents several new techniques for solving least squares problems involving coefficient matrices that are so large that they do not fit in main memory. The implementations include both CPU and GPU variants. All techniques rely on complete orthogonal decompositions that guarantee that both conditions of a least squares solution are met, regardless of the rank properties of the matrix. Specifically, they rely on the recently proposed "randUTV" algorithm that is particularly effective in strongly communication-constrained environments. A detailed precision and performance study reveals that the new methods, that operate on data stored on disk, are competitive with state-of-the-art methods that store all data in main memory.
翻译:求解超大规模线性方程组是科学与技术领域的一项核心计算任务。在许多情况下,线性方程组的系数矩阵存在秩亏现象,导致方程组可能欠定、不相容或同时具有这两种特性。此时,通常需要计算使问题残差最小化的最小二乘解;当系数矩阵存在非平凡零空间时,该解可进一步定义为具有最小范数的解。本研究提出了若干新技术,用于求解涉及无法完全载入主内存的超大规模系数矩阵的最小二乘问题。相关实现包含CPU与GPU两种版本。所有技术均基于完全正交分解,该分解能保证满足最小二乘解的两个条件,且不受矩阵秩特性的影响。具体而言,这些技术依赖于近期提出的"randUTV"算法,该算法在强通信受限环境中表现尤为突出。详细的精度与性能研究表明,这些基于磁盘存储数据的新方法,在性能上可与当前最先进的完全内存存储数据方法相媲美。