Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast, our approach explicitly represents these interactions in a dense and continuous manner. To this end, we propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR, which jointly learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence. It is trained to explicitly promote consistency between the motion and distance change towards the ground. After training, we establish a joint optimization strategy that utilizes GraMMaR as a dual-prior, regularizing the optimization towards the space of plausible ground-aware motions. This leads to realistic and coherent motion reconstruction, irrespective of the assumed or learned ground plane. Through extensive evaluation on the AMASS and AIST++ datasets, our model demonstrates good generalization and discriminating abilities in challenging cases including complex and ambiguous human-ground interactions. The code will be released.
翻译:从RGB视频中准确、真实地重建3D人体运动的关键在于厘清复杂的人-地交互,因为这确保了人体与地面之间的一致性。现有方法通常以隐式或稀疏方式建模人-地交互,在面临噪声与不确定性时,常导致运动结果不真实且错误。与此相反,我们的方法以密集连续的方式显式表征这些交互。为此,我们提出一种新颖的面向3D人体运动重建的地面感知运动模型——GraMMaR,该模型联合学习运动序列中每个时间步上各关节与地面平面之间的姿态转移和交互分布。模型训练过程显式促使运动与地面距离变化之间保持一致性。训练完成后,我们建立联合优化策略,将GraMMaR作为双重先验,引导优化过程趋向合理的地面感知运动空间。无论地面平面是假设所得还是学习所得,该方法均能生成真实且连贯的运动重建结果。在AMASS和AIST++数据集上的大量评估表明,我们的模型在复杂、模糊的人-地交互等挑战性场景中展现出良好的泛化与判别能力。代码将公开发布。