For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet high-quality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.
翻译:对于任何视频编解码器,编码效率高度依赖于当前待编码信号能否从先前重构信号中找到相关上下文。传统编解码器已验证更多上下文可带来显著编码增益,但代价是耗时增加。然而,新兴的神经视频编解码器(NVC)的上下文仍较为有限,导致压缩率较低。为提升NVC性能,本文提出在时域和空域两方面增加上下文多样性。首先,引导模型学习跨帧的分层质量模式,从而丰富长期且高质量的时间上下文。其次,为挖掘基于光流的编码框架潜力,引入基于组的偏移多样性,并提出跨组交互机制以优化上下文挖掘。此外,本文在并行编码潜在表示时采用四叉树分区策略以增加空间上下文多样性。实验表明,本文提出的编解码器相较于此前最优NVC节省23.5%比特率。更值得注意的是,在PSNR指标下,该编解码器在RGB和YUV420色彩空间中均超越了尚在开发中的下一代传统编解码器/ECM。代码开源于https://github.com/microsoft/DCVC。