In this work we introduce three ideas that can further improve particle FRNN physics simulations running on RT Cores; i) a real-time update/rebuild ratio optimizer for the bounding volume hierarchy (BVH) structure, ii) a new RT core use, with two variants, that eliminates the need of a neighbor list and iii) a technique that enables RT cores for FRNN with periodic boundary conditions (BC). Experimental evaluation using the Lennard-Jones FRNN interaction model as a case study shows that the proposed update/rebuild ratio optimizer is capable of adapting to the different dynamics that emerge during a simulation, leading to a RT core pipeline up to $\sim 3.4\times$ faster than with other known approaches to manage the BVH. In terms of simulation step performance, the proposed variants can significantly improve the speedup and EE of the base RT core idea; from $\sim1.3\times$ at small radius to $\sim2.0\times$ for log normal radius distributions. Furthermore, the proposed variants manage to simulate cases that would otherwise not fit in memory because of the use of neighbor lists, such as clusters of particles with log normal radius distribution. The proposed RT Core technique to support periodic BC is indeed effective as it does not introduce any significant penalty in performance. In terms of scaling, the proposed methods scale both their performance and EE across GPU generations. Throughout the experimental evaluation, we also identify the simulation cases were regular GPU computation should still be preferred, contributing to the understanding of the strengths and limitations of RT cores.
翻译:本研究提出了三种可进一步提升基于RT核心的粒子固定半径最近邻物理模拟性能的方法:i) 针对包围体层次结构的实时更新/重建比率优化器;ii) 一种无需邻接列表的新型RT核心应用方案(包含两种变体);iii) 支持周期性边界条件的固定半径最近邻RT核心实现技术。以Lennard-Jones固定半径最近邻相互作用模型为案例的实验评估表明,所提出的更新/重建比率优化器能够自适应模拟过程中涌现的不同动力学特征,使RT核心处理流水线相比现有BVH管理方法最高提速约3.4倍。在模拟步长性能方面,所提变体方案能显著提升基础RT核心方案的加速比与能效比:在较小半径条件下实现约1.3倍加速,在对数正态半径分布条件下达到约2.0倍加速。此外,所提变体方案成功模拟了因邻接列表内存限制而无法处理的场景(如对数正态半径分布的粒子团簇)。支持周期性边界条件的RT核心技术具有实际有效性,且未引入显著性能损失。在扩展性方面,所提方法在不同GPU代际间均能保持性能与能效的同步扩展。通过系统性实验评估,本研究进一步明确了传统GPU计算仍具优势的模拟场景,深化了对RT核心优势与局限性的认知。