Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs. However, most of these works focus on application scenarios that allow a certain amount of system delay (e.g., Random Access mode in MPEG codecs), which is not always acceptable for live delivery. This paper conducts a comparative study of state-of-the-art conventional and learned video coding methods based on a low delay configuration. Specifically, this study includes two MPEG standard codecs (H.266/VVC VTM and JVET ECM), two AOM codecs (AV1 libaom and AVM), and two recent neural video coding models (DCVC-DC and DCVC-FM). To allow a fair and meaningful comparison, the evaluation was performed on test sequences defined in the AOM and MPEG common test conditions in the YCbCr 4:2:0 color space. The evaluation results show that the JVET ECM codecs offer the best overall coding performance among all codecs tested, with a 16.1% (based on PSNR) average BD-rate saving over AOM AVM, and 11.0% over DCVC-FM. We also observed inconsistent performance with the learned video codecs, DCVC-DC and DCVC-FM, for test content with large background motions.
翻译:近年来,随着新标准与基于学习的视频编码器的发展,视频压缩技术取得了显著的编码性能提升。然而,这些研究大多聚焦于允许一定系统延迟的应用场景(例如MPEG编码器中的随机访问模式),这在直播传输中往往无法接受。本文基于低延迟配置,对当前最先进的常规与学习型视频编码方法进行了对比研究。具体而言,本研究涵盖两种MPEG标准编码器(H.266/VVC VTM与JVET ECM)、两种AOM编码器(AV1 libaom与AVM),以及两种近期提出的神经视频编码模型(DCVC-DC与DCVC-FM)。为确保公平有效的比较,评估在AOM与MPEG通用测试条件定义的YCbCr 4:2:0色彩空间测试序列上进行。评估结果表明,在所有测试编码器中,JVET ECM编码器展现出最佳的综合编码性能:相较于AOM AVM平均节省16.1%的BD-rate(基于PSNR),相较于DCVC-FM平均节省11.0%。我们还观察到,对于存在大幅背景运动的测试内容,学习型视频编码器DCVC-DC与DCVC-FM表现出不稳定的性能。