Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving reconstruction loss, our approach successfully boosts multiple baseline INRs in the reconstruction quality and convergence speed for video regression, and exhibits superior inpainting and interpolation results. Further, we integrate a consistent entropy minimization technique and develop video codecs based on these boosted INRs. Experiments on the UVG dataset confirm that our enhanced codecs significantly outperform baseline INRs and offer competitive rate-distortion performance compared to traditional and learning-based codecs.
翻译:隐式神经表征(INR)已成为视频存储与处理领域的一种有前景的方法,在各类视频任务中展现出卓越的通用性。然而,现有方法往往未能充分利用其表征能力,主要原因在于目标帧解码过程中中间特征的对齐不够充分。本文提出了一种通用的增强框架,适用于当前隐式视频表征方法。具体而言,我们采用一个具备时序感知仿射变换模块的条件解码器,将帧索引作为先验条件,有效实现中间特征与目标帧的对齐。此外,我们引入了一种类似正弦NeRV的模块,用于生成多样化的中间特征,并实现更均衡的参数分布,从而提升模型容量。结合保持高频信息的重建损失函数,我们的方法成功提升了多种基线INR在视频回归任务中的重建质量和收敛速度,并在修复与插值结果中展现出优越性能。进一步,我们整合了一致性熵最小化技术,并基于这些增强型INR开发了视频编解码器。在UVG数据集上的实验证实,我们的增强型编解码器显著优于基线INR,并在率失真性能上与传统的及基于学习的编解码器相比具有竞争力。