Learning-based video compression has been extensively studied over the past years, but it still has limitations in adapting to various motion patterns and entropy models. In this paper, we propose multi-mode video compression (MMVC), a block wise mode ensemble deep video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns. Proposed multi-modes include ConvLSTM-based feature domain prediction, optical flow conditioned feature domain prediction, and feature propagation to address a wide range of cases from static scenes without apparent motions to dynamic scenes with a moving camera. We partition the feature space into blocks for temporal prediction in spatial block-based representations. For entropy coding, we consider both dense and sparse post-quantization residual blocks, and apply optional run-length coding to sparse residuals to improve the compression rate. In this sense, our method uses a dual-mode entropy coding scheme guided by a binary density map, which offers significant rate reduction surpassing the extra cost of transmitting the binary selection map. We validate our scheme with some of the most popular benchmarking datasets. Compared with state-of-the-art video compression schemes and standard codecs, our method yields better or competitive results measured with PSNR and MS-SSIM.
翻译:近年来,基于学习的视频压缩方法得到广泛研究,但在适应不同运动模式和熵模型方面仍存在局限性。本文提出多模式视频压缩(MMVC)框架,这是一种基于块级模式集成的深度视频压缩方法,通过选择特征域预测的最优模式来适应不同运动模式。所提出的多模式包括基于ConvLSTM的特征域预测、光流条件特征域预测以及特征传播,以涵盖从无明显运动的静态场景到移动摄像机的动态场景等广泛情况。我们将特征空间划分为块用于空间块表示中的时间预测。在熵编码方面,我们同时考虑密集和稀疏的量化后残差块,并对稀疏残差应用可选的游程编码以提升压缩率。基于此,我们的方法采用由二元密度图引导的双模式熵编码方案,该方案在传输二元选择图的额外成本上实现了显著的码率降低。我们使用若干最流行的基准数据集对方案进行验证。与最新视频压缩方案及标准编解码器相比,我们的方法在PSNR和MS-SSIM指标上取得了更优或具有竞争力的结果。