Pose and motion priors are crucial for recovering realistic and accurate human motion from noisy observations. Substantial progress has been made on pose and shape estimation from images, and recent works showed impressive results using priors to refine frame-wise predictions. However, a lot of motion priors only model transitions between consecutive poses and are used in time-consuming optimization procedures, which is problematic for many applications requiring real-time motion capture. We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion. As part of the dynamical variational autoencoder (DVAE) models family, Motion-DVAE combines the generative capability of VAE models and the temporal modeling of recurrent architectures. Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches in a single framework for real-time 3D human pose estimation. Experiments show that the proposed approach reaches competitive performance with state-of-the-art methods while being much faster.
翻译:姿态与运动先验对于从噪声观测中恢复逼真且精确的人体运动至关重要。在基于图像的人体姿态与形状估计领域已取得显著进展,近期研究利用先验知识优化逐帧预测结果,展现了令人瞩目的性能。然而,现有运动先验大多仅建模连续姿态间的过渡关系,且需借助耗时的优化流程,这难以满足众多实时运动捕捉应用的需求。本文提出Motion-DVAE——一种捕捉人体运动短期依赖关系的运动先验。作为动态变分自编码器(DVAE)模型家族的成员,Motion-DVAE融合了VAE模型的生成能力与循环架构的时间建模特性。基于Motion-DVAE,我们进一步提出无监督学习去噪方法,将回归式与优化式方法统一于单一框架中,用于实时三维人体姿态估计。实验表明,本方法在显著提升计算速度的同时,达到了与现有最先进方法相媲美的性能。