Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.
翻译:神经场,也称为隐式神经表示(INRs),在表示、生成和处理多种数据类型方面展现出卓越能力,能够以低内存占用实现连续数据重建。尽管前景广阔,但应用于视频压缩的INRs仍需大幅提升其率失真性能,且需要大量参数和长时间训练迭代才能捕捉高频细节,这限制了其更广泛的应用。解决这一问题仍是极具挑战性的任务,而攻克它将使INRs在压缩任务中更具可行性。我们通过引入增强型隐式神经视频表示NeRV++,向解决这些不足迈出了一步——该方案作为对原始NeRV解码器架构更直接且有效的增强,创新性地采用了夹紧上采样块(UB)的可分离卷积2D残差块(SCRBs),并引入双线性插值跳跃层以改进特征表示。NeRV++使视频可直接表示为由神经网络逼近的函数,并显著提升了超越现有基于INR视频编解码器的表示能力。我们在UVG、MCL JVC和Bunny数据集上评估了该方法,在基于INR的视频压缩任务中取得了具有竞争力的结果。这一成就缩小了与基于自编码器视频编码的性能差距,标志着基于INR视频压缩研究的重要进展。