Krylov subspace methods are extensively used in scientific computing to solve large-scale linear systems. However, the performance of these iterative Krylov solvers on modern supercomputers is limited by expensive communication costs. The $s$-step strategy generates a series of $s$ Krylov vectors at a time to avoid communication. Asymptotically, the $s$-step approach can reduce communication latency by a factor of $s$. Unfortunately, due to finite-precision implementation, the step size has to be kept small for stability. In this work, we tackle the numerical instabilities encountered in the $s$-step GMRES algorithm. By choosing an appropriate polynomial basis and block orthogonalization schemes, we construct a communication avoiding $s$-step GMRES algorithm that automatically selects the optimal step size to ensure numerical stability. To further maximize communication savings, we introduce scaled Newton polynomials that can increase the step size $s$ to a few hundreds for many problems. An initial step size estimator is also developed to efficiently choose the optimal step size for stability. The guaranteed stability of the proposed algorithm is demonstrated using numerical experiments. In the process, we also evaluate how the choice of polynomial and preconditioning affects the stability limit of the algorithm. Finally, we show parallel scalability on more than 14,000 cores in a distributed-memory setting. Perfectly linear scaling has been observed in both strong and weak scaling studies with negligible communication costs.
翻译:Krylov子空间方法广泛应用于科学计算中求解大规模线性系统。然而,在现代超级计算机上,这些迭代Krylov求解器的性能受到高昂通信成本的限制。s步策略每次生成一组s个Krylov向量以避免通信,理论上可将通信延迟减少s倍。但受有限精度实现影响,步长必须保持较小以确保稳定性。本文解决了s步GMRES算法中的数值不稳定性问题。通过选择合适的多项式基和分块正交化方案,我们构建了一种通信避免的s步GMRES算法,该算法能自动选择最优步长以保证数值稳定性。为进一步最大化通信节省,我们引入缩放牛顿多项式,可将许多问题的步长s提升至数百。此外,还开发了初始步长估计器以高效选择稳定性所需的最优步长。通过数值实验验证了所提算法的稳定性保证。在此过程中,我们还评估了多项式选择和预处理对算法稳定性极限的影响。最后,我们在分布式内存环境下展示了超过14,000核的并行可扩展性。强扩展与弱扩展研究均观察到完美线性缩放,且通信成本可忽略不计。