Pose and motion priors are crucial for recovering realistic and accurate human motion from noisy observations. Substantial progress has been made on pose and shape estimation from images, and recent works showed impressive results using priors to refine frame-wise predictions. However, a lot of motion priors only model transitions between consecutive poses and are used in time-consuming optimization procedures, which is problematic for many applications requiring real-time motion capture. We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion. As part of the dynamical variational autoencoder (DVAE) models family, Motion-DVAE combines the generative capability of VAE models and the temporal modeling of recurrent architectures. Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches in a single framework for real-time 3D human pose estimation. Experiments show that the proposed approach reaches competitive performance with state-of-the-art methods while being much faster.
翻译:姿态与运动先验对于从含噪观测中恢复真实且精确的人体运动至关重要。基于图像的姿态与形状估计已取得显著进展,近期研究利用先验优化逐帧预测展现出卓越效果。然而,现有运动先验大多仅建模连续姿态间的转换,且嵌入耗时优化流程,难以满足众多实时运动捕捉应用的需求。本文提出Motion-DVAE这一运动先验,用于捕获人体运动的短期依赖性。作为动态变分自编码器(DVAE)模型家族成员,Motion-DVAE融合了VAE模型的生成能力与循环架构的时序建模优势。我们进一步引入基于Motion-DVAE的无监督学习去噪方法,将回归方法与优化方法统一于单一框架,实现实时三维人体姿态估计。实验表明,本方法在保持与现有最优技术相当性能的同时,显著提升了处理速度。