The neural radiance fields (NeRF) have advanced the development of 3D volumetric video technology, but the large data volumes they involve pose significant challenges for storage and transmission. To address these problems, the existing solutions typically compress these NeRF representations after the training stage, leading to a separation between representation training and compression. In this paper, we try to directly learn a compact NeRF representation for volumetric video in the training stage based on the proposed rate-aware compression framework. Specifically, for volumetric video, we use a simple yet effective modeling strategy to reduce temporal redundancy for the NeRF representation. Then, during the training phase, an implicit entropy model is utilized to estimate the bitrate of the NeRF representation. This entropy model is then encoded into the bitstream to assist in the decoding of the NeRF representation. This approach enables precise bitrate estimation, thereby leading to a compact NeRF representation. Furthermore, we propose an adaptive quantization strategy and learn the optimal quantization step for the NeRF representations. Finally, the NeRF representation can be optimized by using the rate-distortion trade-off. Our proposed compression framework can be used for different representations and experimental results demonstrate that our approach significantly reduces the storage size with marginal distortion and achieves state-of-the-art rate-distortion performance for volumetric video on the HumanRF and ReRF datasets. Compared to the previous state-of-the-art method TeTriRF, we achieved an approximately -80% BD-rate on the HumanRF dataset and -60% BD-rate on the ReRF dataset.
翻译:神经辐射场(NeRF)技术推动了三维体视频的发展,但其涉及的海量数据对存储与传输提出了重大挑战。现有解决方案通常在训练阶段后对这些NeRF表示进行压缩,导致表示训练与压缩过程相互分离。本文基于提出的码率感知压缩框架,尝试在训练阶段直接学习紧凑的NeRF体视频表示。具体而言,针对体视频数据,我们采用简洁高效的建模策略以降低NeRF表示的时域冗余。随后,在训练阶段引入隐式熵模型来估计NeRF表示的码率,并将该熵模型编码至码流中以辅助NeRF表示的解码。该方法能实现精确的码率估计,从而获得紧凑的NeRF表示。此外,我们提出自适应量化策略,通过学习确定NeRF表示的最优量化步长。最终,NeRF表示可通过率失真权衡进行优化。本压缩框架适用于不同表示形式,实验结果表明:在HumanRF与ReRF数据集上,该方法能以可忽略的失真显著降低存储开销,并取得当前最优的率失真性能。相较于此前最优方法TeTriRF,我们在HumanRF数据集上实现约-80%的BD-rate增益,在ReRF数据集上实现约-60%的BD-rate增益。