Simulations based on particle methods, such as Smoothed Particle Hydrodynamics (SPH), are known to be computationally demanding. While such methods have for long been executed in parallel on multi-core CPUs, in recent years the increasing adoption of many-core accelerators, such as GPUs. However, hardware fragmentation and vendor-specific programming interfaces are still characterizing their market. Hence, support for various hardware configurations may easily lead to non-trivial and less maintainable implementations. To leverage over some higher-level specifications have become available recently, such as the SYCL programming standard, this work highlights the initial effort in adopting the SYCL standard for the execution of SPHinXsys, an open-source multi-physics library. The result is an execution model able to run the same implementation on variable (heterogeneous) hardware, with considerable speed-up compared to the current multi-core CPU parallelization. Among others, representation of data-structures for parallel access, communication strategies, and parallel methods for data sorting will be topics discussed in depth. Benchmarks has also been presented, showcasing performance comparisons between the current multi-core CPU implementation and the newly introduced SYCL parallelization with a GPU back-end.
翻译:基于粒子方法(如光滑粒子流体动力学SPH)的模拟通常具有较高的计算需求。长期以来,这类方法已在多核CPU上实现并行执行,而近年来,以GPU为代表的多核加速器正得到日益广泛的应用。然而,硬件架构的碎片化与厂商特定的编程接口仍是当前市场的显著特征。因此,支持多样化的硬件配置往往会导致实现复杂且难以维护。为利用近年来出现的一些高层规范(如SYCL编程标准),本研究重点介绍了在开源多物理场库SPHinXsys中引入SYCL标准以支持其执行的初步工作。由此构建的执行模型能够在不同(异构)硬件上运行同一套实现代码,相比当前的多核CPU并行方案获得了显著的加速效果。具体而言,本文深入探讨了面向并行访问的数据结构表示、通信策略以及数据排序的并行方法等主题。文中还提供了性能基准测试,对比展示了当前多核CPU实现与新引入的基于GPU后端的SYCL并行化方案之间的性能表现。