The Versatile Video Coding (VVC) standard has been recently finalized by the Joint Video Exploration Team (JVET). Compared to the High Efficiency Video Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in encoding complexity. In this paper, we propose a method based on Convolutional Neural Network (CNN) to speed up the inter partitioning process in VVC. Firstly, a novel representation for the quadtree with nested multi-type tree (QTMT) partition is introduced, derived from the partition path. Secondly, we develop a U-Net-based CNN taking a multi-scale motion vector field as input at the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict the optimal partition path during the Rate-Distortion Optimization (RDO) process. To achieve this, we divide CTU into grids and predict the Quaternary Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the grid. Thirdly, an efficient partition pruning algorithm is introduced to employ the CNN predictions at each partitioning level to skip RDO evaluations of unnecessary partition paths. Finally, an adaptive threshold selection scheme is designed, making the trade-off between complexity and efficiency scalable. Experiments show that the proposed method can achieve acceleration ranging from 16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32) configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in terms of BD-rate, which surpasses other state-of-the-art solutions. Additionally, our method stands out as one of the lightest approaches in the field, which ensures its applicability to other encoders.
翻译:通用视频编码(VVC)标准近期由联合视频探索专家组(JVET)最终确立。相较于高效视频编码(HEVC)标准,VVC在Bjontegaard Delta率(BD-rate)指标上可提升约50%的压缩效率,但编码复杂度增加了10倍。本文提出一种基于卷积神经网络(CNN)的方法以加速VVC的帧间划分过程。首先,引入一种基于划分路径的嵌套多类型树四叉树(QTMT)分区新型表示方法;其次,开发基于U-Net的CNN模型,在编码树单元(CTU)层级以多尺度运动矢量场为输入,通过CNN推理在率失真优化(RDO)过程中预测最优划分路径。为此,我们将CTU划分为网格,并预测每个网格单元的四叉树(QT)深度与多类型树(MT)分裂决策。再次,设计高效修剪算法,在各划分层级利用CNN预测结果跳过非必要划分路径的RDO评估。最后,提出自适应阈值选择方案,实现复杂度与效率的可伸缩权衡。实验表明,在随机访问图像组32(RAGOP32)配置下,本方法可实现16.5%至60.2%的加速,同时BD-rate效率损失仅为0.44%至4.59%,性能超越现有先进方案。此外,本方法作为该领域最轻量化的方案之一,确保其可应用于其他编码器。