Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the local contexts. Global contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint local and global motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for local motion compensation. To capture global context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint local and global motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM.
翻译:现有学习型视频压缩模型采用光流网络或可变形卷积网络来估计运动信息。然而,光流网络和可变形卷积网络有限的感受野使其注意力天然聚焦于局部上下文。大规模运动及帧间全局相关性等全局上下文被忽略,成为制约运动信息精确捕获的关键瓶颈。为解决该问题,我们提出面向学习型视频编码的联合局部与全局运动补偿模块。具体而言,采用光流网络实现局部运动补偿;为捕获全局上下文,我们在特征域引入交叉注意力机制进行运动补偿。此外,为避免原始交叉注意力的二次复杂度,我们将注意力中的softmax运算分解为两个独立的softmax运算,从而实现线性复杂度。为验证所提LGMC的有效性,我们将其集成至DCVC-TCM框架,构建了联合局部与全局运动补偿的学习型视频压缩模型LVC-LGMC。大量实验表明,相较于基线模型DCVC-TCM,我们的LVC-LGMC在率失真性能上取得显著提升。