Neural Video Compression with Feature Modulation

The emerging conditional coding-based neural video codec (NVC) shows superiority over commonly-used residual coding-based codec and the latest NVC already claims to outperform the best traditional codec. However, there still exist critical problems blocking the practicality of NVC. In this paper, we propose a powerful conditional coding-based NVC that solves two critical problems via feature modulation. The first is how to support a wide quality range in a single model. Previous NVC with this capability only supports about 3.8 dB PSNR range on average. To tackle this limitation, we modulate the latent feature of the current frame via the learnable quantization scaler. During the training, we specially design the uniform quantization parameter sampling mechanism to improve the harmonization of encoding and quantization. This results in a better learning of the quantization scaler and helps our NVC support about 11.4 dB PSNR range. The second is how to make NVC still work under a long prediction chain. We expose that the previous SOTA NVC has an obvious quality degradation problem when using a large intra-period setting. To this end, we propose modulating the temporal feature with a periodically refreshing mechanism to boost the quality. %Besides solving the above two problems, we also design a single model that can support both RGB and YUV colorspaces. Notably, under single intra-frame setting, our codec can achieve 29.7\% bitrate saving over previous SOTA NVC with 16\% MACs reduction. Our codec serves as a notable landmark in the journey of NVC evolution. The codes are at https://github.com/microsoft/DCVC.

翻译：新兴的基于条件编码的神经视频编解码器（NVC）展现出优于常用残差编码编解码器的性能，且最新的NVC已宣称超越最佳传统编解码器。然而，仍有若干关键问题阻碍着NVC的实际应用。本文提出一种强大的基于条件编码的NVC，通过特征调制解决了两个关键问题。其一，如何在单一模型中支持宽质量范围。此前具备此能力的NVC平均仅支持约3.8 dB的PSNR范围。为突破此限制，我们通过可学习量化缩放器对当前帧的潜在特征进行调制。训练过程中，我们特别设计了均匀量化参数采样机制，以提升编码与量化的协调性。这使得量化缩放器得到更好的学习，并帮助我们的NVC支持约11.4 dB的PSNR范围。其二，如何使NVC在长预测链下仍保持有效。我们揭示了先前最先进的NVC在使用大帧内周期设置时存在明显的质量退化问题。为此，我们提出通过周期性刷新机制对时间特征进行调制以提升质量。除解决上述两个问题外，我们还设计了可同时支持RGB和YUV色彩空间的单一模型。值得注意的是，在单帧内设置下，我们的编解码器相比先前最先进的NVC可实现29.7%的码率节省，同时减少16%的MAC计算量。该编解码器标志着NVC发展历程中的重要里程碑。代码见https://github.com/microsoft/DCVC。