Learned video compression methods have gained a variety of interest in the video coding community since they have matched or even exceeded the rate-distortion (RD) performance of traditional video codecs. However, many current learning-based methods are dedicated to utilizing short-range temporal information, thus limiting their performance. In this paper, we focus on exploiting the unique characteristics of video content and further exploring temporal information to enhance compression performance. Specifically, for long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference. In that case temporal prior contains valuable temporal information of all decoded images within the current GOP. As for short-range temporal information, we propose a progressive guided motion compensation to achieve robust and effective compensation. In detail, we design a hierarchical structure to achieve multi-scale compensation. More importantly, we use optical flow guidance to generate pixel offsets between feature maps at each scale, and the compensation results at each scale will be used to guide the following scale's compensation. Sufficient experimental results demonstrate that our method can obtain better RD performance than state-of-the-art video compression approaches. The code is publicly available on: https://github.com/Huairui/LSTVC.
翻译:学习型视频压缩方法因在率失真(RD)性能上已达到甚至超越传统视频编码标准,在视频编码领域获得了广泛关注。然而,现有大量基于学习的方法仅专注于利用短时域信息,从而限制了其性能提升。本文聚焦于挖掘视频内容的独有特性,并进一步探索时域信息以增强压缩性能。具体而言,针对长时域信息利用,我们提出可在推理过程中于图像组(GOP)内持续更新的时域先验。该先验蕴含当前GOP内所有已解码图像的宝贵时域信息。针对短时域信息,我们提出渐进引导式运动补偿以实现稳健有效的补偿。具体地,我们设计层级化结构以达成多尺度补偿。更重要的是,利用光流引导生成各尺度特征图间的像素偏移,且每尺度的补偿结果将用于指导下一尺度的补偿。充分的实验结果表明,本方法在率失真性能上优于当前最先进的视频压缩方法。相关代码已在 https://github.com/Huairui/LSTVC 开源。