For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet high-quality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.
翻译:对于任何视频编解码器,编码效率高度依赖于当前待编码信号能否从先前重建信号中找到相关上下文。传统编解码器已证实更多上下文能带来显著的编码增益,但耗时较长。然而,新兴的神经视频编解码器(NVC)的上下文仍较为有限,导致压缩比偏低。为提升NVC性能,本文提出在时间和空间维度上增加上下文多样性。首先,我们引导模型学习跨帧的层次化质量模式,从而丰富长期且高质量的时间上下文。此外,为挖掘基于光流的编码框架潜力,我们引入基于组的偏移多样性,并提出跨组交互以更好地挖掘上下文。同时,本文还采用四叉树分区方法,在并行编码潜在表示时增加空间上下文多样性。实验表明,与以往最先进的NVC相比,我们的编解码器节省了23.5%的码率。更值得注意的是,在峰值信噪比(PSNR)指标上,我们的编解码器在RGB和YUV420色彩空间中均超越了正在开发中的下一代传统编解码器/ECM。代码已开源至https://github.com/microsoft/DCVC。