Hardware Acceleration of Neural Graphics

Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic imagery, thereby promising a replacement for traditional rendering algorithms with scalable quality and predictable performance. In this work we ask the question: Does neural graphics (NG) need hardware support? We studied representative NG applications showing that, if we want to render 4k res. at 60FPS there is a gap of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications, there is an even larger gap of 2-4 OOM between the desired performance and the required system power. We identify that the input encoding and the MLP kernels are the performance bottlenecks, consuming 72%,60% and 59% of application time for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings, respectively. We propose a NG processing cluster, a scalable and flexible hardware architecture that directly accelerates the input encoding and MLP kernels through dedicated engines and supports a wide range of NG applications. We also accelerate the rest of the kernels by fusing them together in Vulkan, which leads to 9.94X kernel-level performance improvement compared to un-fused implementation of the pre-processing and the post-processing kernels. Our results show that, NGPC gives up to 58X end-to-end application-level performance improvement, for multi res. hashgrid encoding on average across the four NG applications, the performance benefits are 12X,20X,33X and 39X for the scaling factor of 8,16,32 and 64, respectively. Our results show that with multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS for NeRF and 8k res. at 120FPS for all our other NG applications.

翻译：驱动传统计算机图形的渲染和逆向渲染算法近期已被神经表示（NR）所取代。神经表示近年来被用于学习场景的几何与材质属性，并利用这些信息合成逼真图像，从而有望以可扩展的质量和可预测的性能替代传统渲染算法。本研究提出一个关键问题：神经图形（NG）是否需要硬件支持？我们分析了代表性NG应用，发现若要在60FPS下渲染4K分辨率，当前GPU存在1.5倍至55倍的性能差距。对于增强现实/虚拟现实（AR/VR）应用，目标性能与系统功耗需求之间甚至存在2-4个数量级的鸿沟。我们识别出输入编码和MLP核是性能瓶颈，在多重分辨率哈希网格、多重分辨率密集网格和低分辨率密集网格编码中分别消耗应用总时间的72%、60%和59%。为此，我们提出一种神经图形处理集群（NGPC）——一种可扩展且灵活的硬件架构，通过专用引擎直接加速输入编码和MLP核，并支持多种NG应用。我们进一步通过Vulkan融合加速其余核，使预处理和后处理核的核级性能相比未融合实现提升9.94倍。实验结果表明，对于多重分辨率哈希网格编码，NGPC在四种NG应用上实现最高58倍的端到端应用级性能提升；当缩放因子为8、16、32、64时，平均性能增益分别为12倍、20倍、33倍和39倍。研究显示，采用多重分辨率哈希网格编码时，NGPC可使NeRF在30FPS下渲染4K分辨率，并使其他所有NG应用在120FPS下渲染8K分辨率。