We propose a deep learning based novel prediction framework for enhanced bandwidth reduction in motion transfer enabled video applications such as video conferencing, virtual reality gaming and privacy preservation for patient health monitoring. To model complex motion, we use the First Order Motion Model (FOMM) that represents dynamic objects using learned keypoints along with their local affine transformations. Keypoints are extracted by a self-supervised keypoint detector and organized in a time series corresponding to the video frames. Prediction of keypoints, to enable transmission using lower frames per second on the source device, is performed using a Variational Recurrent Neural Network (VRNN). The predicted keypoints are then synthesized to video frames using an optical flow estimator and a generator network. This efficacy of leveraging keypoint based representations in conjunction with VRNN based prediction for both video animation and reconstruction is demonstrated on three diverse datasets. For real-time applications, our results show the effectiveness of our proposed architecture by enabling up to 2x additional bandwidth reduction over existing keypoint based video motion transfer frameworks without significantly compromising video quality.
翻译:我们提出了一种基于深度学习的新型预测框架,用于在运动传输类视频应用(如视频会议、虚拟现实游戏以及患者健康监测的隐私保护)中实现高效带宽缩减。为模拟复杂运动,采用一阶运动模型通过学习到的关节点及其局部仿射变换来表征动态对象。关节点由自监督关节点检测器提取,并按视频帧时间序列进行组织。通过变分循环神经网络对关节点进行预测,使源设备能够以较低帧率传输数据。预测得到的关节点随后通过光流估计器和生成网络合成为视频帧。我们在三个不同数据集上验证了基于关节点表征联合变分循环神经网络预测在视频动画与重构中的有效性。实验结果表明,在实时应用场景下,与现有基于关键点的视频运动传输框架相比,我们的架构可在不显著降低视频质量的前提下,额外实现高达2倍的带宽缩减。