Video compression has recently benefited from implicit neural representations (INRs), which model videos as continuous functions. INRs offer compact storage and flexible reconstruction, providing a promising alternative to traditional codecs. However, most existing INR-based methods treat the temporal dimension as an independent input, limiting their ability to capture complex temporal dependencies. To address this, we propose a Hierarchical Temporal Neural Representation for Videos, TeNeRV. TeNeRV integrates short- and long-term dependencies through two key components. First, an Inter-Frame Feature Fusion (IFF) module aggregates features from adjacent frames, enforcing local temporal coherence and capturing fine-grained motion. Second, a GoP-Adaptive Modulation (GAM) mechanism partitions videos into Groups-of-Pictures and learns group-specific priors. The mechanism modulates network parameters, enabling adaptive representations across different GoPs. Extensive experiments demonstrate that TeNeRV consistently outperforms existing INR-based methods in rate-distortion performance, validating the effectiveness of our proposed approach.
翻译:视频压缩技术近期受益于隐式神经表示(INRs),后者将视频建模为连续函数。INRs 具备紧凑存储与灵活重建的优势,为传统编解码器提供了具有前景的替代方案。然而,现有大多数基于 INR 的方法将时间维度视为独立输入,限制了其捕捉复杂时序依赖关系的能力。为解决这一问题,我们提出了一种面向视频的层次化时序神经表示方法——TeNeRV。TeNeRV 通过两个核心组件整合短期与长期依赖关系:首先,帧间特征融合模块聚合相邻帧的特征,增强局部时序一致性并捕捉细粒度运动;其次,GoP 自适应调制机制将视频划分为图像组并学习组特定先验,通过调制网络参数实现跨不同 GoP 的自适应表示。大量实验表明,TeNeRV 在率失真性能上持续优于现有基于 INR 的方法,验证了所提方法的有效性。