Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.
翻译:近年来,神经视频压缩(NVC)技术取得了显著性能,甚至超越了最佳的传统有损视频编解码器。然而,大多数现有NVC方法严重依赖传输运动矢量(MV)来生成准确的上下文特征,这存在以下缺点:(1)压缩和传输MV需要专门的MV编码器和解码器,导致模块冗余。(2)由于MV编码器-解码器的存在,训练策略复杂。本文提出了一种新颖的单流NVC框架(SSNVC),它移除了复杂的MV编码器-解码器结构,并采用单阶段训练策略。SSNVC通过将先前熵模型特征添加到当前熵模型,并在解码器端使用前两帧生成预测运动信息,从而隐式利用时序信息。此外,我们增强了帧生成器以生成更高质量的重建帧。实验表明,SSNVC能够在多个基准测试中实现最先进的性能,并能极大简化压缩过程与训练过程。