Smoothed Particle Hydrodynamics (SPH) is essential for modeling complex large-deformation problems across various applications, requiring significant computational power. A major portion of SPH computation time is dedicated to the Nearest Neighboring Particle Search (NNPS) process. While advanced NNPS algorithms have been developed to enhance SPH efficiency, the potential efficiency gains from modern computation hardware remain underexplored. This study investigates the impact of GPU parallel architecture, low-precision computing on GPUs, and GPU memory management on NNPS efficiency. Our approach employs a GPU-accelerated mixed-precision SPH framework, utilizing low-precision float-point 16 (FP16) for NNPS while maintaining high precision for other components. To ensure FP16 accuracy in NNPS, we introduce a Relative Coordinated-based Link List (RCLL) algorithm, storing FP16 relative coordinates of particles within background cells. Our testing results show three significant speedup rounds for CPU-based NNPS algorithms. The first comes from parallel GPU computations, with up to a 1000x efficiency gain. The second is achieved through low-precision GPU computing, where the proposed FP16-based RCLL algorithm offers a 1.5x efficiency improvement over the FP64-based approach on GPUs. By optimizing GPU memory bandwidth utilization, the efficiency of the FP16 RCLL algorithm can be further boosted by 2.7x, as demonstrated in an example with 1 million particles. Our code is released at https://github.com/pnnl/lpNNPS4SPH.
翻译:光滑粒子流体动力学(SPH)是模拟各类复杂大变形问题的重要方法,其求解过程需要极大的计算能力。SPH计算时间的绝大部分消耗于最近邻粒子搜索(NNPS)过程。尽管已有先进的NNPS算法用于提升SPH效率,但现代计算硬件带来的潜在效率增益仍未得到充分探索。本研究系统分析了GPU并行架构、GPU低精度计算以及GPU内存管理对NNPS效率的影响。我们采用GPU加速的混合精度SPH框架,在NNPS过程中使用低精度16位浮点数(FP16),同时保持其他组件的高精度计算。为确保FP16在NNPS中的计算精度,我们提出了一种基于相对坐标的链接列表(RCLL)算法,该算法将粒子在背景胞体内的FP16相对坐标进行存储。测试结果表明,该方法相比基于CPU的NNPS算法实现了三个显著加速阶段:首先,通过GPU并行计算实现了高达1000倍的效率提升;其次,通过低精度GPU计算,所提出的基于FP16的RCLL算法在GPU上相比基于FP64的方法实现了1.5倍的效率提升;最后,通过优化GPU显存带宽利用率,在百万粒子规模算例中,FP16 RCLL算法的效率可进一步提升2.7倍。相关代码已开源发布于https://github.com/pnnl/lpNNPS4SPH。