The recursive intra-frame block partitioning decision process, a crucial component of the next-generation video coding standards, exerts significant influence over the encoding time. In this paper, we propose an encoder-decoder neural network (NN) to accelerate this process. Specifically, a CNN is utilized to compress the pixel data of the largest coding unit (LCU) into a fixed-length vector. Subsequently, a Transformer decoder is employed to transcribe the fixed-length vector into a variable-length vector, which represents the block partitioning outcomes of the encoding LCU. The vector transcription process adheres to the constraints imposed by the block partitioning algorithm. By fully parallelizing the NN prediction in the intra-mode decision, substantial time savings can be attained during the decision phase. The experimental results obtained from high-definition (HD) sequences coding demonstrate that this framework achieves a remarkable 87.84\% reduction in encoding time, with a relatively small loss (8.09\%) of coding performance compared to AVS3 HPM4.0.
翻译:递归帧内块划分决策过程是下一代视频编码标准的重要组成部分,对编码时间有显著影响。本文提出一种编码器-解码器神经网络(NN)来加速该过程。具体而言,利用CNN将最大编码单元(LCU)的像素数据压缩为固定长度向量。随后,采用Transformer解码器将该固定长度向量转录为可变长度向量,以表示编码LCU的块划分结果。该向量转录过程遵循块划分算法施加的约束。通过在帧内模式决策中完全并行化NN预测,可在决策阶段实现显著的时间节省。对高清(HD)序列编码的实验结果表明,与AVS3 HPM4.0相比,该框架实现了87.84%的编码时间减少,同时编码性能损失相对较小(8.09%)。