The Versatile Video Coding (VVC) standard has been finalized by Joint Video Exploration Team (JVET) in 2020. Compared to the High Efficiency Video Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in terms of Bjontegaard Delta-Rate (BD-rate), at the cost of about 10x more encoder complexity. In this paper, we propose a Convolutional Neural Network (CNN)-based method to speed up inter partitioning in VVC. Our method operates at the Coding Tree Unit (CTU) level, by splitting each CTU into a fixed grid of 8x8 blocks. Then each cell in this grid is associated with information about the partitioning depth within that area. A lightweight network for predicting this grid is employed during the rate-distortion optimization to limit the Quaternary Tree (QT)-split search and avoid partitions that are unlikely to be selected. Experiments show that the proposed method can achieve acceleration ranging from 17% to 30% in the RandomAccess Group Of Picture 32 (RAGOP32) mode of VVC Test Model (VTM)10 with a reasonable efficiency drop ranging from 0.37% to 1.18% in terms of BD-rate increase.
翻译:通用视频编码(VVC)标准已于2020年由联合视频探索小组(JVET)最终确定。与高效视频编码(HEVC)标准相比,VVC在Bjontegaard Delta-Rate(BD-rate)指标上实现了约50%的压缩效率提升,但编码器复杂度增加了约10倍。本文提出一种基于卷积神经网络(CNN)的方法来加速VVC中的帧间划分过程。该方法在编码树单元(CTU)级别运行,通过将每个CTU分割成固定的8×8块网格,随后为网格中的每个单元关联该区域内的划分深度信息。在率失真优化过程中,采用轻量级网络预测此网格,以限制四叉树(QT)分裂搜索,避免选择可能性极低的划分方式。实验表明,所提方法在VVC测试模型(VTM)10的随机访问图像组32(RAGOP32)模式下可实现17%至30%的加速,同时BD-rate增幅仅为0.37%至1.18%,效率下降合理。