Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we propose using the discrete cosine transform (DCT) on the tensorial radiance fields to compress the feature-grid. This feature-grid is transformed into coefficients, which are then quantized and entropy encoded, following a similar approach to the traditional video coding pipeline. Furthermore, to achieve a higher level of sparsity, we propose using an entropy parameterization technique for the frequency domain, specifically for DCT coefficients of the feature-grid. Since the transformed coefficients are optimized during the training phase, the proposed model does not require any fine-tuning or additional information. Our model only requires a lightweight compression pipeline for encoding and decoding, making it easier to apply volumetric radiance field methods for real-world applications. Experimental results demonstrate that our proposed frequency domain entropy model can achieve superior compression performance across various datasets. The source code will be made publicly available.
翻译:基于显式特征网格的NeRF模型在渲染质量与训练加速方面已展现出显著优势,然而这类方法通常需要大量数据来表示单一场景或对象。本研究提出一种压缩模型,旨在通过最小化频域熵来有效降低数据规模。首先,我们采用离散余弦变换(DCT)对张量辐射场进行特征网格压缩:将特征网格变换为系数,随后对这些系数进行量化与熵编码,其流程类似于传统视频编码框架。此外,为实现更高稀疏度,我们提出一种针对频域的熵参数化技术(特别适用于特征网格的DCT系数)。由于变换系数在训练阶段即得到优化,本模型无需微调或附加信息。模型仅需轻量级压缩流水线进行编码与解码,这使得体辐射场方法更易应用于现实场景。实验结果表明,本文提出的频域熵模型能在多种数据集上实现卓越的压缩性能。源代码将公开提供。