Offline and Online Optical Flow Enhancement for Deep Video Compression

Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows, however, may be less suitable for video compression due to the following two factors. First, the optical flow estimation networks were trained to perform inter-frame prediction as accurately as possible, but the optical flows themselves may cost too many bits to encode. Second, the optical flow estimation networks were trained on synthetic data, and may not generalize well enough to real-world videos. We address the twofold limitations by enhancing the optical flows in two stages: offline and online. In the offline stage, we fine-tune a trained optical flow estimation network with the motion information provided by a traditional (non-deep) video compression scheme, e.g. H.266/VVC, as we believe the motion information of H.266/VVC achieves a better rate-distortion trade-off. In the online stage, we further optimize the latent features of the optical flows with a gradient descent-based algorithm for the video to be compressed, so as to enhance the adaptivity of the optical flows. We conduct experiments on a state-of-the-art deep video compression scheme, DCVC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 12.8% bitrate saving on the tested videos, without increasing the model or computational complexity of the decoder side.

翻译：视频压缩在很大程度上依赖于利用视频帧之间的时间冗余，这一目标通常通过运动信息的估计与使用来实现。在现有的大多数深度视频压缩网络中，运动信息通常以光流形式表示。然而，由于以下两个因素，光流可能并不完全适用于视频压缩任务：第一，光流估计网络被训练为尽可能精确地进行帧间预测，但其本身编码时可能消耗过多比特；第二，光流估计网络通常在合成数据上训练，因此可能无法充分泛化到真实视频场景。针对这两方面限制，我们提出了一种两阶段光流增强方法：离线增强与在线增强。在离线阶段，我们利用传统（非深度）视频压缩方案（例如 H.266/VVC）提供的运动信息对已训练的光流估计网络进行微调——因为我们相信 H.266/VVC 的运动信息能够实现更优的率失真权衡。在在线阶段，我们针对待压缩视频，采用基于梯度下降的算法进一步优化光流的潜在特征，以增强光流的自适应性。我们在当前最先进的深度视频压缩方案 DCVC 上开展了实验。实验结果表明，所提出的离线与在线增强方法联合应用，在无需增加解码端模型或计算复杂度的前提下，平均可为测试视频节省 12.8% 的比特率。