3D Gaussian Splatting (3DGS) has emerged as a leading neural rendering technique for high-fidelity view synthesis, prompting the development of dedicated 3DGS accelerators for resource-constrained platforms. The conventional decoupled preprocessing-rendering dataflow in existing accelerators has two major limitations: 1) a significant portion of preprocessed Gaussians are not used in rendering, and 2) the same Gaussian gets repeatedly loaded across different tile renderings, resulting in substantial computational and data movement overhead. To address these issues, we propose GCC, a novel accelerator designed for fast and energy-efficient 3DGS inference. GCC introduces a novel dataflow featuring: 1) \textit{cross-stage conditional processing}, which interleaves preprocessing and rendering to dynamically skip unnecessary Gaussian preprocessing; and 2) \textit{Gaussian-wise rendering}, ensuring that all rendering operations for a given Gaussian are completed before moving to the next, thereby eliminating duplicated Gaussian loading. We also propose an alpha-based boundary identification method to derive compact and accurate Gaussian regions, thereby reducing rendering costs. We implement our GCC accelerator in 28nm technology. Extensive experiments demonstrate that GCC significantly outperforms the state-of-the-art 3DGS inference accelerator, GSCore, in both performance and energy efficiency.
翻译:三维高斯泼溅(3DGS)已成为实现高保真视图合成的领先神经渲染技术,推动了面向资源受限平台的专用3DGS加速器的发展。现有加速器中传统的解耦式预处理-渲染数据流存在两大主要局限:1)大量预处理后的高斯元在渲染过程中未被使用;2)同一高斯元在不同图块渲染过程中被重复加载,导致显著的计算与数据移动开销。为解决这些问题,我们提出GCC——一种专为快速、高能效3DGS推理设计的新型加速器。GCC引入了一种新颖的数据流,其特点包括:1)\textit{跨阶段条件处理},通过交错执行预处理与渲染来动态跳过不必要的高斯元预处理;2)\textit{高斯逐元渲染},确保对给定高斯元的所有渲染操作在切换至下一高斯元前全部完成,从而消除重复的高斯元加载。我们还提出了一种基于alpha值的边界识别方法,用于推导紧凑而精确的高斯区域,进而降低渲染成本。我们在28纳米工艺下实现了GCC加速器。大量实验表明,GCC在性能与能效方面均显著优于当前最先进的3DGS推理加速器GSCore。