We introduce Region-Adaptive Learned Hierarchical Encoding (RALHE) for 3D Gaussian Splatting (3DGS) data. While 3DGS has recently become popular for novel view synthesis, the size of trained models limits its deployment in bandwidth-constrained applications such as volumetric media streaming. To address this, we propose a learned hierarchical latent representation that builds upon the principles of "overfitted" learned image compression (e.g., Cool-Chic and C3) to efficiently encode 3DGS attributes. Unlike images, 3DGS data have irregular spatial distributions of Gaussians (geometry) and consist of multiple attributes (signals) defined on the irregular geometry. Our codec is designed to account for these differences between images and 3DGS. Specifically, we leverage the octree structure of the voxelized 3DGS geometry to obtain a hierarchical multi-resolution representation. Our approach overfits latents to each Gaussian attribute under a global rate constraint. These latents are decoded independently through a lightweight decoder network. To estimate the bitrate during training, we employ an autoregressive probability model that leverages octree-derived contexts from the 3D point structure. The multi-resolution latents, decoder, and autoregressive entropy coding networks are jointly optimized for each Gaussian attribute. Experiments demonstrate that the proposed RALHE compression framework achieves a rendering PSNR gain of up to 2dB at low bitrates (less than 1 MB) compared to the baseline 3DGS compression methods.
翻译:本文提出了面向3D高斯泼溅(3DGS)数据的区域自适应学习分层编码(RALHE)方法。尽管3DGS在新视角合成领域近期广受欢迎,但训练后模型的大小限制了其在带宽受限应用(如体媒体流传输)中的部署。为解决此问题,我们提出一种基于“过拟合”学习图像压缩原理(如Cool-Chic和C3)构建的分层潜在表示,用于高效编码3DGS属性。与图像不同,3DGS数据具有不规则空间分布的高斯几何结构,且包含定义在不规则几何上的多种属性信号。我们的编解码器专门针对图像与3DGS之间的这些差异进行设计。具体而言,我们利用体素化3DGS几何的八叉树结构获取分层多分辨率表示。该方法在全局码率约束下对每个高斯属性的潜在表示进行过拟合。这些潜在表示通过轻量级解码器网络独立解码。为在训练过程中估计码率,我们采用基于3D点结构八叉树上下文的自回归概率模型。多分辨率潜在表示、解码器和自回归熵编码网络针对每个高斯属性进行联合优化。实验表明,与基线3DGS压缩方法相比,所提出的RALHE压缩框架在低码率(小于1MB)条件下可实现高达2dB的渲染PSNR增益。