Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25$\times$ reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.
翻译:神经辐射场(NeRFs)在捕捉高保真度复杂三维场景方面展现出卓越潜力。然而,阻碍NeRFs广泛应用的持续挑战之一在于体积渲染带来的计算瓶颈。另一方面,三维高斯泼溅(3DGS)近期作为替代表示方法出现,它采用基于三维高斯的表示并利用光栅化管线而非体积渲染来生成图像,实现了极快的渲染速度和优秀的图像质量。然而,3DGS存在显著缺陷,即需要大量三维高斯体来维持渲染图像的高保真度,这导致巨大的内存和存储开销。为解决这一关键问题,我们重点关注两个目标:在不牺牲性能的前提下减少高斯点数量,以及压缩高斯属性(如视角依赖的颜色和协方差)。为此,我们提出一种可学习的掩码策略,在显著减少高斯数量的同时保持高性能。此外,我们采用基于网格的神经场替代球谐函数,提出了一种紧凑而高效的视角依赖颜色表示方法。最后,我们通过向量量化学习码本,以紧凑形式表示高斯的几何属性。结合量化与熵编码等模型压缩技术,与3DGS相比,我们持续实现超过25倍的存储压缩率和更快的渲染速度,同时保持场景表示质量。我们的工作为三维场景表示提供了完整框架,实现了高性能、快速训练、紧凑性和实时渲染。项目页面见https://maincold2.github.io/c3dgs/。