Light field is a type of image data that captures the 3D scene information by recording light rays emitted from a scene at various orientations. It offers a more immersive perception than classic 2D images but at the cost of huge data volume. In this paper, we draw inspiration from the visual characteristics of Sub-Aperture Images (SAIs) of light field and design a compact neural network representation for the light field compression task. The network backbone takes randomly initialized noise as input and is supervised on the SAIs of the target light field. It is composed of two types of complementary kernels: descriptive kernels (descriptors) that store scene description information learned during training, and modulatory kernels (modulators) that control the rendering of different SAIs from the queried perspectives. To further enhance compactness of the network meanwhile retain high quality of the decoded light field, we accordingly introduce modulator allocation and kernel tensor decomposition mechanisms, followed by non-uniform quantization and lossless entropy coding techniques, to finally form an efficient compression pipeline. Extensive experiments demonstrate that our method outperforms other state-of-the-art (SOTA) methods by a significant margin in the light field compression task. Moreover, after aligning descriptors, the modulators learned from one light field can be transferred to new light fields for rendering dense views, indicating a potential solution for view synthesis task.
翻译:光场是一种图像数据,通过记录场景在不同方向发出的光线来捕获三维场景信息。与经典的二维图像相比,它能提供更沉浸的感知,但代价是巨大的数据量。本文受光场子孔径图像视觉特性的启发,设计了一种紧凑的神经网络表示用于光场压缩任务。该网络主干以随机初始化的噪声作为输入,并基于目标光场的子孔径图像进行监督训练。它由两种互补的核组成:描述性核(描述符)用于存储训练过程中学习到的场景描述信息,调制性核(调制器)用于控制从查询视角渲染不同的子孔径图像。为了进一步提升网络的紧凑性同时保持解码光场的高质量,我们相应引入了调制器分配和核张量分解机制,并结合非均匀量化和无损熵编码技术,最终形成高效的压缩流程。大量实验表明,本方法在光场压缩任务中显著优于其他最先进方法。此外,在描述符对齐后,从一个光场学习到的调制器可迁移至新光场用于渲染密集视图,这为视图合成任务提供了一种潜在解决方案。