We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting NVIDIA Graphics Processing Unit (GPU) accelerators. The work extends our previous efforts in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the single GPU kernels. Strong and weak scalability results on well-known benchmark test cases of the new version of the library are discussed. Comparisons with the Nvidia AmgX solution show an improvement of up to 2.0x in the solve phase.
翻译:我们以开源形式发布了一款稀疏线性求解器,该求解器能高效利用异构并行计算机。该求解器可轻松集成到需要在配备NVIDIA图形处理器(GPU)加速器的混合节点构成的现代并行计算机上求解大规模稀疏线性系统的科学应用中。本工作拓展了我们先前在单GPU加速器利用方面的研究,并基于混合MPI-CUDA软件环境,提出了一种依赖于BootCMatchG库中已有的高效代数多重网格(AMG)预条件子的Krylov型线性求解器的实现方案。混合实现的设计遵循了最小化多GPU数据通信开销的最佳实践,同时保持了单GPU核函数的效率。我们讨论了新版本库在知名基准测试案例中的强扩展性与弱扩展性结果。与Nvidia AmgX解决方案的对比表明,求解阶段的性能提升最高可达2.0倍。