Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers

Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices across diverse applications. However, achieving the desired real-time rendering speeds for immersive interactions is still hindered by (1) the lack of a universal algorithmic solution for different application scenarios and (2) the dedication of existing devices or accelerators to merely specific rendering pipelines. To overcome this challenge, we have developed a unified neural rendering accelerator that caters to a wide array of typical neural rendering pipelines, enabling real-time and on-device rendering across different applications while maintaining both efficiency and compatibility. Our accelerator design is based on the insight that, although neural rendering pipelines vary and their algorithm designs are continually evolving, they typically share common operators, predominantly executing similar workloads. Building on this insight, we propose a reconfigurable hardware architecture that can dynamically adjust dataflow to align with specific rendering metric requirements for diverse applications, effectively supporting both typical and the latest hybrid rendering pipelines. Benchmarking experiments and ablation studies on both synthetic and real-world scenes demonstrate the effectiveness of the proposed accelerator. The proposed unified accelerator stands out as the first solution capable of achieving real-time neural rendering across varied representative pipelines on edge devices, potentially paving the way for the next generation of neural graphics applications.

翻译：近年来，神经渲染技术及其支撑设备的进步为沉浸式三维体验铺平了道路，显著改变了人类在多样化应用场景中与智能设备的交互方式。然而，实现沉浸式交互所需的实时渲染速度仍受限于两大障碍：(1) 缺乏适用于不同应用场景的通用算法解决方案；(2) 现有设备或加速器仅针对特定渲染流程进行专门设计。为克服这一挑战，我们开发了一种统一的神经渲染加速器，该加速器可适配一系列典型的神经渲染流程，在保持高效性与兼容性的同时，实现跨应用的实时设备端渲染。我们的加速器设计基于以下洞察：尽管神经渲染流程各异且其算法设计持续演进，但它们通常共享通用的算子，主要执行相似的计算负载。基于此洞察，我们提出一种可重构硬件架构，能够动态调整数据流以适应不同应用的具体渲染指标要求，有效支持典型及最新的混合渲染流程。在合成场景与真实场景上进行的基准测试与消融实验验证了所提加速器的有效性。该统一加速器作为首个能够在边缘设备上跨多种代表性流程实现实时神经渲染的解决方案脱颖而出，有望为下一代神经图形应用的发展开辟道路。