高斯混合单元：面向AR/VR中基于高斯分布实时渲染的边缘GPU插件 (Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR)

The rapidly advancing field of Augmented and Virtual Reality (AR/VR) demands real-time, photorealistic rendering on resource-constrained platforms. 3D Gaussian Splatting, delivering state-of-the-art (SOTA) performance in rendering efficiency and quality, has emerged as a promising solution across a broad spectrum of AR/VR applications. However, despite its effectiveness on high-end GPUs, it struggles on edge systems like the Jetson Orin NX Edge GPU, achieving only 7-17 FPS -- well below the over 60 FPS standard required for truly immersive AR/VR experiences. Addressing this challenge, we perform a comprehensive analysis of Gaussian-based AR/VR applications and identify the Gaussian Blending Stage, which intensively calculates each Gaussian's contribution at every pixel, as the primary bottleneck. In response, we propose a Gaussian Blending Unit (GBU), an edge GPU plug-in module for real-time rendering in AR/VR applications. Notably, our GBU can be seamlessly integrated into conventional edge GPUs and collaboratively supports a wide range of AR/VR applications. Specifically, GBU incorporates an intra-row sequential shading (IRSS) dataflow that shades each row of pixels sequentially from left to right, utilizing a two-step coordinate transformation. When directly deployed on a GPU, the proposed dataflow achieved a non-trivial 1.72x speedup on real-world static scenes, though still falls short of real-time rendering performance. Recognizing the limited compute utilization in the GPU-based implementation, GBU enhances rendering speed with a dedicated rendering engine that balances the workload across rows by aggregating computations from multiple Gaussians. Experiments across representative AR/VR applications demonstrate that our GBU provides a unified solution for on-device real-time rendering while maintaining SOTA rendering quality.

翻译：增强现实与虚拟现实（AR/VR）领域的快速发展对资源受限平台上的实时逼真渲染提出了迫切需求。三维高斯泼溅技术在渲染效率与质量方面实现了最先进的性能，已成为广泛AR/VR应用中极具前景的解决方案。然而，尽管该技术在高性能GPU上表现优异，在如Jetson Orin NX等边缘GPU系统上却仅能达到7-17 FPS，远低于沉浸式AR/VR体验所需的60 FPS标准。为应对这一挑战，我们对基于高斯分布的AR/VR应用进行了全面分析，发现高斯混合阶段——即密集计算每个高斯分布在每个像素上的贡献——是主要性能瓶颈。为此，我们提出高斯混合单元（GBU），一种面向AR/VR应用实时渲染的边缘GPU插件模块。值得注意的是，GBU可无缝集成至传统边缘GPU，并协同支持各类AR/VR应用。具体而言，GBU采用行内顺序着色（IRSS）数据流，通过两步坐标变换从左至右顺序处理每行像素。当直接在GPU上部署时，该数据流在真实静态场景中实现了1.72倍的显著加速，但仍未达到实时渲染要求。基于对GPU实现中计算利用率不足的认知，GBU通过专用渲染引擎提升渲染速度，该引擎通过聚合多个高斯分布的计算量来实现行间负载均衡。在典型AR/VR应用上的实验表明，GBU在为设备端提供实时渲染统一解决方案的同时，保持了最先进的渲染质量。