The proliferation of high resolution videos posts great storage and bandwidth pressure on cloud video services, driving the development of next-generation video codecs. Despite great progress made in neural video coding, existing approaches are still far from economical deployment considering the complexity and rate-distortion performance tradeoff. To clear the roadblocks for neural video coding, in this paper we propose a new framework featuring standard compatibility, high performance, and low decoding complexity. We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video codec, to encode videos at different resolutions. The rate-distorion optimal downsampling ratio is signaled to the decoder at the per-sequence level for each target rate. We design a low complexity neural post-processor architecture that can handle different upsampling ratios. The change of resolution exploits the spatial redundancy in high-resolution videos, while the neural wrapper further achieves rate-distortion performance improvement through end-to-end optimization with a codec proxy. Our light-weight post-processor architecture has a complexity of 516 MACs / pixel, and achieves 9.3% BD-Rate reduction over VVC on the UVG dataset, and 6.4% on AOM CTC Class A1. Our approach has the potential to further advance the performance of the latest video coding standards using neural processing with minimal added complexity.
翻译:高分辨率视频的激增给云视频服务带来了巨大的存储和带宽压力,推动了下一代视频编解码器的发展。尽管神经视频编码取得了巨大进展,但考虑到复杂性与率失真性能的权衡,现有方法距离经济部署仍相去甚远。为扫清神经视频编码的障碍,本文提出一种具备标准兼容性、高性能和低解码复杂度的新框架。我们采用一组联合优化的神经预处理器和后处理器,封装一个标准视频编解码器,以在不同分辨率下编码视频。率失真最优的下采样比率在序列级别针对每个目标码率被信令通知解码器。我们设计了一种低复杂度的神经后处理器架构,能够处理不同的上采样比率。分辨率变化利用了高分辨率视频中的空间冗余,而神经封装器则通过与编解码器代理的端到端优化进一步实现了率失真性能的提升。我们的轻量级后处理器架构复杂度为516 MACs/像素,在UVG数据集上相比VVC实现了9.3%的BD-Rate节省,在AOM CTC Class A1上实现了6.4%的节省。我们的方法有潜力通过神经处理以最小的额外复杂度,进一步提升最新视频编码标准的性能。