3D Gaussian Splatting (3DGS) has emerged as a leading neural rendering technique for high-fidelity view synthesis, prompting the development of dedicated 3DGS accelerators for mobile applications. Through in-depth analysis, we identify two major limitations in the conventional decoupled preprocessing-rendering dataflow adopted by existing accelerators: 1) a significant portion of preprocessed Gaussians are not used in rendering, and 2) the same Gaussian gets repeatedly loaded across different tile renderings, resulting in substantial computational and data movement overhead. To address these issues, we propose GCC, a novel accelerator designed for fast and energy-efficient 3DGS inference. At the dataflow level, GCC introduces: 1) cross-stage conditional processing, which interleaves preprocessing and rendering to dynamically skip unnecessary Gaussian preprocessing; and 2) Gaussian-wise rendering, ensuring that all rendering operations for a given Gaussian are completed before moving to the next, thereby eliminating duplicated Gaussian loading. We also propose an alpha-based boundary identification method to derive compact and accurate Gaussian regions, thereby reducing rendering costs. We implement our GCC accelerator in 28nm technology. Extensive experiments demonstrate that GCC significantly outperforms the state-of-the-art 3DGS inference accelerator, GSCore, in both performance and energy efficiency.
翻译:3D高斯泼溅(3DGS)已成为高保真视图合成领域领先的神经渲染技术,推动了面向移动应用的专用3DGS加速器的研发。通过深入分析,我们发现现有加速器采用的解耦式预处理-渲染数据流存在两大主要局限:1)大量预处理后的高斯元素在渲染过程中未被使用;2)同一高斯元素在不同图块渲染过程中被重复加载,导致显著的计算与数据移动开销。为解决这些问题,我们提出GCC——一种专为快速高效能3DGS推理设计的新型加速器。在数据流层面,GCC引入:1)跨阶段条件处理机制,通过交错执行预处理与渲染来动态跳过不必要的高斯预处理;2)高斯逐元渲染策略,确保单个高斯元素的所有渲染操作在切换至下一元素前全部完成,从而消除高斯元素的重复加载。我们还提出一种基于alpha值的边界识别方法,用于推导紧凑且精确的高斯区域,进而降低渲染开销。我们在28nm工艺节点上实现了GCC加速器。大量实验表明,GCC在性能与能效方面均显著优于当前最先进的3DGS推理加速器GSCore。