Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10$\times$ reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.
翻译:神经辐射场(NeRFs)在捕获复杂三维场景并实现高保真度方面展现出显著潜力。然而,阻碍NeRFs广泛应用的一个持续挑战是体渲染带来的计算瓶颈。另一方面,三维高斯泼溅(3DGS)最近作为一种替代表示方法出现,它利用基于三维高斯的表示,并采用光栅化管线而非体渲染来生成图像,从而实现了极快的渲染速度和令人满意的图像质量。然而,3DGS的一个显著缺陷在于需要大量三维高斯粒子来维持渲染图像的高保真度,这导致了巨大的内存和存储需求。为解决这一关键问题,我们重点关注两个核心目标:在不牺牲性能的前提下减少高斯点的数量,并对高斯属性(如视角依赖的颜色和协方差)进行压缩。为此,我们提出了一种可学习的掩码策略,能够在显著减少高斯数量的同时保持高性能。此外,我们通过采用基于网格的神经场而非球谐函数,提出了一种紧凑而有效的视角依赖颜色表示。最后,我们通过向量量化学习码本,以紧凑的方式表示高斯的几何属性。在大量实验中,与3DGS相比,我们始终实现了超过10倍的存储压缩和更快的渲染速度,同时保持了场景表示的保真度。我们的工作为三维场景表示提供了一个综合框架,实现了高性能、快速训练、紧凑性及实时渲染。项目页面位于https://maincold2.github.io/c3dgs/。