In neural video codecs, current state-of-the-art methods typically adopt multi-scale motion compensation to handle diverse motions. These methods estimate and compress either optical flow or deformable offsets to reduce inter-frame redundancy. However, flow-based methods often suffer from inaccurate motion estimation in complicated scenes. Deformable convolution-based methods are more robust but have a higher bit cost for motion coding. In this paper, we propose a hybrid context generation module, which combines the advantages of the above methods in an optimal way and achieves accurate compensation at a low bit cost. Specifically, considering the characteristics of features at different scales, we adopt flow-guided deformable compensation at largest-scale to produce accurate alignment in detailed regions. For smaller-scale features, we perform flow-based warping to save the bit cost for motion coding. Furthermore, we design a local-global context enhancement module to fully explore the local-global information of previous reconstructed signals. Experimental results demonstrate that our proposed Hybrid Local-Global Context learning (HLGC) method can significantly enhance the state-of-the-art methods on standard test datasets.
翻译:在神经视频编解码器中,当前最先进的方法通常采用多尺度运动补偿来处理多样化运动。这些方法通过估计并压缩光流或可变形偏移来减少帧间冗余。然而,基于光流的方法在复杂场景中常受限于运动估计不准确的问题。基于可变形卷积的方法虽更具鲁棒性,但其运动编码的比特成本较高。本文提出一种混合上下文生成模块,以最优方式结合上述方法的优势,在低比特成本下实现精确补偿。具体而言,考虑到不同尺度特征的性质,我们在最大尺度采用光流引导的可变形补偿,以在细节区域实现精确对齐;对于较小尺度特征,则执行基于光流的变形操作以节省运动编码的比特开销。此外,我们设计了局部-全局上下文增强模块,以充分挖掘先前重建信号的局部与全局信息。实验结果表明,我们提出的混合局部-全局上下文学习(HLGC)方法能在标准测试数据集上显著提升现有最优方法的性能。