Batched linear solvers play a vital role in computational sciences, especially in the fields of plasma physics and combustion simulations. With the imminent deployment of the Aurora Supercomputer and other upcoming systems equipped with Intel GPUs, there is a compelling demand to expand the capabilities of these solvers for Intel GPU architectures. In this paper, we present our efforts in porting and optimizing the batched iterative solvers on Intel GPUs using the SYCL programming model. The SYCL-based implementation exhibits impressive performance and scalability on the Intel GPU Max 1550s (Ponte Vecchio GPUs). The solvers outperform our previous CUDA implementation on NVIDIA H100 GPUs by an average of 2.4x for the PeleLM application inputs. The batched solvers are ready for production use in real-world scientific applications through the Ginkgo library.
翻译:批处理线性求解器在计算科学领域发挥着关键作用,尤其在等离子体物理和燃烧模拟应用中。随着Aurora超级计算机及其他配备Intel GPU的新型系统即将部署,亟需扩展这些求解器对Intel GPU架构的支撑能力。本文阐述了我们采用SYCL编程模型将批处理迭代求解器移植并优化至Intel GPU的工作成果。基于SYCL的实现方案在Intel GPU Max 1550s(Ponte Vecchio GPU)上展现出卓越的性能与可扩展性。对于PeleLM应用输入数据,该求解器相比我们在NVIDIA H100 GPU上基于CUDA的先前实现方案,平均性能提升达2.4倍。目前,这些批处理求解器已可通过Ginkgo库投入实际科学应用的生产环境。