We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e. each frame is reconstructed from a reference frame, which limits the reconstruction quality when the bandwidth is larger. Instead, we propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. The residuals can be in turn coded in a predictive manner, thus removing efficiently temporal dependencies. Our experiments indicate a significant bitrate gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC, on a datasetof talking-head videos
翻译:我们针对会议类视频的高效压缩问题展开研究。基于图像动画的最新方法——通过一组紧凑的稀疏关键点表示人脸运动,能够在极低比特率下实现良好的重建质量。然而,此类方法采用逐帧编码方式(即每帧图像均通过参考帧重建),在带宽较大时会限制重建质量。为此,我们提出一种预测编码方案:以图像动画作为预测器,对实际目标帧的残差进行编码。该残差可进一步采用预测方式编码,从而高效消除时间依赖性。实验结果表明,在说话头视频数据集上,与HEVC视频标准相比,本方法的比特率降低幅度超过70%;相较于VVC标准,降低幅度超过30%。