In this work we introduce three ideas that can further improve particle FRNN physics simulations running on RT Cores; i) a real-time update/rebuild ratio optimizer for the bounding volume hierarchy (BVH) structure, ii) a new RT core use, with two variants, that eliminates the need of a neighbor list and iii) a technique that enables RT cores for FRNN with periodic boundary conditions (BC). Experimental evaluation using the Lennard-Jones FRNN interaction model as a case study shows that the proposed update/rebuild ratio optimizer is capable of adapting to the different dynamics that emerge during a simulation, leading to a RT core pipeline up to $\sim 3.4\times$ faster than with other known approaches to manage the BVH. In terms of simulation step performance, the proposed variants can significantly improve the speedup and energy efficiency (EE) of the base RT core idea; from $\sim1.3\times$ at small radius to $\sim2.0\times$ for log normal radius distributions. Furthermore, the proposed variants manage to simulate cases that would otherwise not fit in memory because of the use of neighbor lists, such as clusters of particles with log normal radius distribution. The proposed RT Core technique to support periodic BC is indeed effective as it does not introduce any significant penalty in performance. In terms of scaling, the proposed methods scale both their performance and EE across GPU generations. Throughout the experimental evaluation, we also identify the simulation cases were regular GPU computation should still be preferred, contributing to the understanding of the strengths and limitations of RT cores.
翻译:本研究提出了三种能够进一步提升基于RT核心的粒子固定半径最近邻物理模拟性能的方法:i) 针对包围体层次结构(BVH)的实时更新/重建比率优化器;ii) 一种无需邻接列表的新型RT核心应用方案(包含两种变体);iii) 一种使RT核心能够支持周期性边界条件的固定半径最近邻搜索技术。以Lennard-Jones固定半径最近邻相互作用模型为案例的实验评估表明,所提出的更新/重建比率优化器能够自适应模拟过程中出现的不同动力学状态,使RT核心处理流水线相比其他已知的BVH管理方法提速最高达$\sim 3.4$倍。在模拟步进性能方面,所提出的两种变体方案能显著提升基础RT核心方案的加速比与能效:在较小半径条件下可达$\sim1.3$倍,而对数正态半径分布条件下可达$\sim2.0$倍。此外,这些变体方案能够模拟原本因使用邻接列表而导致内存不足的案例,例如具有对数正态半径分布的粒子团簇。所提出的支持周期性边界条件的RT核心技术确实有效,且未引入明显的性能损失。在扩展性方面,所提方法在不同GPU代际间均能保持性能与能效的同步扩展。通过系统实验评估,本研究也明确了仍应优先采用常规GPU计算的模拟场景,这有助于深化对RT核心优势与局限性的理解。