For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet high-quality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.
翻译:对于任何视频编解码器而言,编码效率高度依赖于当前待编码信号能否从先前重建信号中找到相关上下文。传统编解码器已证实更多的上下文能带来显著的编码增益,但代价是耗费大量时间。然而,新兴的神经视频编解码器(NVC)的上下文仍然有限,导致压缩率较低。为提升NVC性能,本文提出在时间和空间维度上增加上下文多样性。首先,我们引导模型学习跨帧的分层质量模式,从而丰富长期且高质量的时间上下文。此外,为挖掘基于光流的编码框架潜力,我们引入基于组的偏移多样性,并提出跨组交互以实现更好的上下文挖掘。同时,本文在并行编码隐式表示时采用基于四叉树的划分方法以增加空间上下文多样性。实验表明,我们的编解码器相较于此前最优的NVC节省了23.5%的码率。更值得关注的是,在PSNR指标下,我们的编解码器在RGB和YUV420色彩空间中均超越了正在开发中的下一代传统编解码器/ECM。代码详见https://github.com/microsoft/DCVC。