Learned video compression methods already outperform VVC in the low-delay (LD) case, but the random-access (RA) scenario remains challenging. Most works on learned RA video compression either use HEVC as an anchor or compare it to VVC in specific test conditions, using RGB-PSNR metric instead of Y-PSNR and avoiding comprehensive evaluation. Here, we present an end-to-end learned video codec for random access that combines training on long sequences of frames, rate allocation designed for hierarchical coding and content adaptation on inference. We show that under common test conditions (JVET-CTC), it achieves results comparable to VTM (VVC reference software) in terms of YUV-PSNR BD-Rate on some classes of videos, and outperforms it on almost all test sets in terms of VMAF BD-Rate. On average it surpasses open LD and RA end-to-end solutions in terms of VMAF and YUV BD-Rates.
翻译:学习型视频压缩方法在低延迟(LD)场景下已超越VVC标准,但随机访问(RA)场景仍具挑战性。现有大多数关于学习型RA视频压缩的研究要么以HEVC作为基准,要么仅在特定测试条件下与VVC进行比较——这些研究通常采用RGB-PSNR指标而非Y-PSNR指标,且缺乏全面评估。本文提出一种面向随机访问的端到端学习型视频编解码器,其融合了长帧序列训练、专为分层编码设计的码率分配机制以及推理阶段的内容自适应技术。实验表明,在通用测试条件(JVET-CTC)下,该编解码器在部分视频类别的YUV-PSNR BD-Rate指标上达到与VTM(VVC参考软件)相当的水平,在几乎所有测试集的VMAF BD-Rate指标上均优于VTM。平均而言,其在VMAF与YUV BD-Rate指标上均超越现有的开放LD与RA端到端解决方案。