This paper presents novel approaches to parallelizing particle interactions on a GPU when there are few particles per cell and the interactions are limited by a cutoff distance. The paper surveys classical algorithms and then introduces two alternatives that aim to utilize shared memory. The first approach copies the particles of a sub-box, while the second approach loads particles in a pencil along the X-axis. The different implementations are compared on three GPU models using Cuda and Hip. The results show that the X-pencil approach can provide a significant speedup but only in very specific cases.
翻译:本文针对GPU上粒子间相互作用在稀疏粒子分布及截断距离限制条件下的并行化问题,提出了创新性解决方案。文章首先系统评述了经典算法,继而引入两种旨在充分利用共享内存的优化策略。第一种方法复制子区域内的全部粒子数据,第二种方法则沿X轴方向以"铅笔状"结构加载粒子。研究在三种GPU架构上采用Cuda与Hip框架对各类实现方案进行了对比测试。结果表明,X轴铅笔加载方法虽能实现显著加速,但其优势仅适用于特定场景。