During the last decade GPU technology has shifted from pure general purpose computation to the inclusion of application specific integrated circuits (ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these special purpose GPU cores were designed to further accelerate specific fields such as AI and real-time rendering, recent research has managed to exploit them to further accelerate other tasks that typically used regular GPU computing. In this work we present RTXRMQ, a new approach that can compute range minimum queries (RMQs) with RT cores. The main contribution is the proposal of a geometric solution for RMQ, where elements become triangles that are placed and shaped according to the element's value and position in the array, respectively, such that the closest hit of a ray launched from a point given by the query parameters corresponds to the result of that query. Experimental results show that RTXRMQ is currently best suited for small query ranges relative to the problem size, achieving up to $5\times$ and $2.3\times$ of speedup over state of the art CPU (HRMQ) and GPU (LCA) approaches, respectively. Although for medium and large query ranges RTXRMQ is currently surpassed by LCA, it is still competitive by being $2.5\times$ and $4\times$ faster than HRMQ which is a highly parallel CPU approach. Furthermore, performance scaling experiments across the latest RTX GPU architectures show that if the current RT scaling trend continues, then RTXRMQ's performance would scale at a higher rate than HRMQ and LCA, making the approach even more relevant for future high performance applications that employ batches of RMQs.
翻译:过去十年间,GPU技术已从纯粹通用计算转向集成专用集成电路(ASIC),例如张量核心(Tensor Cores)和光线追踪(RT)核心。尽管这些专用GPU核心旨在进一步加速AI和实时渲染等特定领域,但近期研究已成功利用它们来加速通常使用常规GPU计算的其他任务。本文提出RTXRMQ,一种利用RT核心计算范围最小值查询(RMQ)的新方法。主要贡献在于提出了RMQ的几何解决方案:将数组元素转化为三角形,其形状和位置分别由元素的值和数组中的索引决定,使得根据查询参数发出的射线首次命中结果即对应查询结果。实验结果表明,RTXRMQ当前最适合处理相对于问题规模较小的查询范围,相较于最先进的CPU方法(HRMQ)和GPU方法(LCA)分别实现了高达5倍和2.3倍的加速。尽管对于中大型查询范围,RTXRMQ目前性能不及LCA,但其仍具有竞争力——相较于高度并行的CPU方法HRMQ,其速度仍快2.5倍至4倍。此外,跨最新RTX GPU架构的性能扩展实验表明,若当前RT性能扩展趋势持续,RTXRMQ的性能扩展速率将高于HRMQ和LCA,这使得该方法对未来采用批量RMQ的高性能应用更具相关性。