Krylov subspace methods are extensively used in scientific computing to solve large-scale linear systems. However, the performance of these iterative Krylov solvers on modern supercomputers is limited by expensive communication costs. The $s$-step strategy generates a series of $s$ Krylov vectors at a time to avoid communication. Asymptotically, the $s$-step approach can reduce communication latency by a factor of $s$. Unfortunately, due to finite-precision implementation, the step size has to be kept small for stability. In this work, we tackle the numerical instabilities encountered in the $s$-step GMRES algorithm. By choosing an appropriate polynomial basis and block orthogonalization schemes, we construct a communication avoiding $s$-step GMRES algorithm that automatically selects the optimal step size to ensure numerical stability. To further maximize communication savings, we introduce scaled Newton polynomials that can increase the step size $s$ to a few hundreds for many problems. An initial step size estimator is also developed to efficiently choose the optimal step size for stability. The guaranteed stability of the proposed algorithm is demonstrated using numerical experiments. In the process, we also evaluate how the choice of polynomial and preconditioning affects the stability limit of the algorithm. Finally, we show parallel scalability on more than 114,000 cores in a distributed-memory setting. Perfectly linear scaling has been observed in both strong and weak scaling studies with negligible communication costs.
翻译:Krylov子空间方法在科学计算中被广泛用于求解大规模线性系统。然而,这些迭代Krylov求解器在现代超级计算机上的性能受到高昂通信成本限制。$s$步策略通过一次生成一系列$s$个Krylov向量来避免通信。渐近地,$s$步方法可将通信延迟降低$s$倍。遗憾的是,由于有限精度实现,为保证稳定性步长必须保持较小。本文致力于解决$s$步GMRES算法中遇到的数值不稳定问题。通过选取适当的多项式基和块正交化方案,我们构建了一种避免通信的$s$步GMRES算法,能自动选择最优步长以确保数值稳定性。为进一步最大化通信节省,我们引入缩放牛顿多项式,可将步长$s$提升至数百量级以应对多种问题。同时开发了初始步长估计器,能高效选择保证稳定性的最优步长。通过数值实验验证了所提算法具有可保证的稳定性。在此过程中,我们还评估了多项式选择与预条件处理如何影响算法的稳定性极限。最后,我们在分布式内存环境中展示了超过114,000核的并行可扩展性。在强扩展与弱扩展研究中均观察到完美的线性扩展,且通信成本可忽略不计。