Over the past few years, learning-based video compression has become an active research area. However, most works focus on P-frame coding. Learned B-frame coding is under-explored and more challenging. This work introduces a novel B-frame coding framework, termed B-CANF, that exploits conditional augmented normalizing flows for B-frame coding. B-CANF additionally features two novel elements: frame-type adaptive coding and B*-frames. Our frame-type adaptive coding learns better bit allocation for hierarchical B-frame coding by dynamically adapting the feature distributions according to the B-frame type. Our B*-frames allow greater flexibility in specifying the group-of-pictures (GOP) structure by reusing the B-frame codec to mimic P-frame coding, without the need for an additional, separate P-frame codec. On commonly used datasets, B-CANF achieves the state-of-the-art compression performance as compared to the other learned B-frame codecs and shows comparable BD-rate results to HM-16.23 under the random access configuration in terms of PSNR. When evaluated on different GOP structures, our B*-frames achieve similar performance to the additional use of a separate P-frame codec.
翻译:在过去几年中,基于学习的视频压缩已成为一个活跃的研究领域。然而,大多数研究聚焦于P帧编码,基于学习的B帧编码尚未充分探索且更具挑战性。本文提出了一种新颖的B帧编码框架B-CANF,该框架利用条件增强归一化流实现B帧编码。B-CANF还包含两个创新元素:帧类型自适应编码和B*帧。通过根据B帧类型动态调整特征分布,我们的帧类型自适应编码为层次化B帧编码学习更优的比特分配。B*帧通过复用B帧编解码器模拟P帧编码(无需额外独立的P帧编解码器),在指定图像组结构时提供更高灵活性。在常用数据集上,B-CANF相较于其他基于学习的B帧编解码器实现了最先进的压缩性能,并在随机接入配置下以PSNR为指标展示出与HM-16.23相当的BD-rate结果。当在不同GOP结构上进行评估时,我们的B*帧实现了与额外使用独立P帧编解码器相似的性能。