Using more reference frames can significantly improve the compression efficiency in neural video compression. However, in low-latency scenarios, most existing neural video compression frameworks usually use the previous one frame as reference. Or a few frameworks which use the previous multiple frames as reference only adopt a simple multi-reference frames propagation mechanism. In this paper, we present a more reasonable multi-reference frames propagation mechanism for neural video compression, called butterfly multi-reference frame propagation mechanism (Butterfly), which allows a more effective feature fusion of multi-reference frames. By this, we can generate more accurate temporal context conditional prior for Contextual Coding Module. Besides, when the number of decoded frames does not meet the required number of reference frames, we duplicate the nearest reference frame to achieve the requirement, which is better than duplicating the furthest one. Experiment results show that our method can significantly outperform the previous state-of-the-art (SOTA), and our neural codec can achieve -7.6% bitrate save on HEVC Class D dataset when compares with our base single-reference frame model with the same compression configuration.
翻译:使用更多参考帧能够显著提升神经视频编码的压缩效率。然而,在低延迟场景下,现有大多数神经视频编码框架通常仅将前一帧作为参考帧。少数采用多参考帧的框架仅实现简单的多参考帧传播机制。本文提出一种更合理的神经视频编码多参考帧传播机制——蝴蝶型多参考帧传播机制(Butterfly),该机制能够更有效地融合多参考帧特征。通过该方法,我们可为上下文编码模块生成更精确的时间上下文条件先验。此外,当已解码帧数未达到所需参考帧数量时,我们通过复制最近距离参考帧来满足要求,该策略优于复制最远距离参考帧的方案。实验结果表明,本方法显著优于现有最佳方法(SOTA)。在与相同压缩配置的基准单参考帧模型对比中,我们在HEVC Class D数据集上实现了-7.6%的码率节省。