Recent work on implicit neural representations (INRs) has evidenced their potential for efficiently representing and encoding conventional video content. In this paper we, for the first time, extend their application to immersive (multi-view) videos, by proposing MV-HiNeRV, a new INR-based immersive video codec. MV-HiNeRV is an enhanced version of a state-of-the-art INR-based video codec, HiNeRV, which was developed for single-view video compression. We have modified the model to learn a different group of feature grids for each view, and share the learnt network parameters among all views. This enables the model to effectively exploit the spatio-temporal and the inter-view redundancy that exists within multi-view videos. The proposed codec was used to compress multi-view texture and depth video sequences in the MPEG Immersive Video (MIV) Common Test Conditions, and tested against the MIV Test model (TMIV) that uses the VVenC video codec. The results demonstrate the superior performance of MV-HiNeRV, with significant coding gains (up to 72.33\%) over TMIV. The implementation of MV-HiNeRV is published for further development and evaluation.
翻译:近期关于隐式神经表示(INRs)的研究已展现出其在高效表示和编码传统视频内容方面的潜力。本文首次将其应用拓展至沉浸式(多视角)视频,提出了一种基于INRs的新型沉浸式视频编解码器——MV-HiNeRV。MV-HiNeRV是基于单视角视频压缩的最先进INR编解码器HiNeRV的增强版本。我们对模型进行了改进:为每个视角学习独立的特征网格组,同时令所有视角共享网络参数,从而有效利用多视角视频中存在的时空冗余与视角间冗余。该编解码器被用于压缩MPEG沉浸式视频(MIV)通用测试条件下的多视角纹理与深度视频序列,并与采用VVenC视频编解码器的MIV测试模型(TMIV)进行对比。结果表明,MV-HiNeRV性能优越,相较于TMIV实现了高达72.33%的编码增益。MV-HiNeRV的实现已公开发布以支持后续开发与评估。