Batched linear solvers play a vital role in computational sciences, especially in the fields of plasma physics and combustion simulations. With the imminent deployment of the Aurora Supercomputer and other upcoming systems equipped with Intel GPUs, there is a compelling demand to expand the capabilities of these solvers for Intel GPU architectures. In this paper, we present our efforts in porting and optimizing the batched iterative solvers on Intel GPUs using the SYCL programming model. These new solvers achieve impressive performance on the Intel GPU Max 1550s (Ponte Vecchio GPUs) which surpass our previous CUDA implementation on NVIDIA H100 GPUs by an average of 2.4x for the PeleLM application inputs. The batched solvers are ready for production use in real-world scientific applications through the Ginkgo library, complementing the performance portability of the batched functionality of Ginkgo.
翻译:批处理线性求解器在计算科学中扮演着关键角色,尤其在等离子体物理和燃烧模拟领域。随着极光超级计算机及其他即将部署的、搭载英特尔 GPU 的系统即将投入使用,扩展这些求解器对英特尔 GPU 架构的适配能力已成为迫切需求。本文介绍了我们利用 SYCL 编程模型将批处理迭代求解器移植并优化至英特尔 GPU 的相关工作。这些新型求解器在英特尔 GPU Max 1550(Ponte Vecchio GPU)上展现了卓越性能,针对 PeleLM 应用输入,其平均性能比我们此前基于 NVIDIA H100 GPU 的 CUDA 实现提升了 2.4 倍。这些批处理求解器已通过 Ginkgo 库为实际科学应用中的生产环境做好准备,进一步增强了 Ginkgo 批处理功能的性能可移植性。