Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the local contexts. Global contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint local and global motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for local motion compensation. To capture global context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint local and global motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM.
翻译:现有学习型视频编码模型采用光流网络或可变形卷积网络(DCN)估计运动信息。然而,光流网络与DCN有限的感受野固有限制了其对局部上下文的注意力,全局上下文(如大尺度运动及帧间全局相关性)被忽略,成为捕获精确运动信息的主要瓶颈。针对该问题,本文提出面向学习型视频编码的联合局部与全局运动补偿模块(LGMC)。具体而言,我们采用光流网络进行局部运动补偿;为捕获全局上下文,在特征域运用交叉注意力机制进行运动补偿。同时,为避免原始交叉注意力的二次复杂度,我们将注意力中的softmax运算分解为两个独立softmax运算,实现线性复杂度。为验证所提LGMC的有效性,我们将其集成至DCVC-TCM框架,构建了联合局部与全局运动补偿的学习型视频编码模型(LVC-LGMC)。大量实验表明,与基准模型DCVC-TCM相比,本文LVC-LGMC在率失真性能上取得显著提升。