Multigrid solvers are the standard in modern scientific computing simulations. Domain Decomposition Aggregation-Based Algebraic Multigrid, also known as the DD-$\alpha$AMG solver, is a successful realization of an algebraic multigrid solver for lattice quantum chromodynamics. Its CPU implementation has made it possible to construct, for some particular discretizations, simulations otherwise computationally unfeasible, and furthermore it has motivated the development and improvement of other algebraic multigrid solvers in the area. From an existing version of DD-$\alpha$AMG already partially ported via CUDA to run some finest-level operations of the multigrid solver on Nvidia GPUs, we translate the CUDA code here by using HIP to run on the ORISE supercomputer. We moreover extend the smoothers available in DD-$\alpha$AMG, paying particular attention to Richardson smoothing, which in our numerical experiments has led to a multigrid solver faster than smoothing with GCR and only 10% slower compared to SAP smoothing. Then we port the odd-even-preconditioned versions of GMRES and Richardson via CUDA. Finally, we extend some computationally intensive coarse-grid operations via advanced vectorization.
翻译:多重网格求解器是现代科学计算模拟中的标准工具。基于区域分解聚合的代数多重网格求解器,也称为DD-$\alpha$AMG求解器,是格点量子色动力学领域中代数多重网格求解器的一个成功实现。其CPU版本使得针对某些特定离散化方案的模拟成为可能,而这些模拟在计算上原本是不可行的;此外,它也推动并改进了该领域其他代数多重网格求解器的发展。基于一个已通过CUDA部分移植、能够在Nvidia GPU上运行多重网格求解器最细层操作的现有DD-$\alpha$AMG版本,我们在此通过HIP转换CUDA代码,使其能够在ORISE超级计算机上运行。我们进一步扩展了DD-$\alpha$AMG中可用的平滑器,特别关注Richardson平滑方法——我们的数值实验表明,使用该方法的求解器比采用GCR平滑的求解器更快,且仅比SAP平滑慢10%。随后,我们通过CUDA移植了奇偶预处理版本的GMRES与Richardson方法。最后,我们通过高级向量化技术扩展了一些计算密集型的粗网格操作。