Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast, our approach explicitly represents these interactions in a dense and continuous manner. To this end, we propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR, which jointly learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence. It is trained to explicitly promote consistency between the motion and distance change towards the ground. After training, we establish a joint optimization strategy that utilizes GraMMaR as a dual-prior, regularizing the optimization towards the space of plausible ground-aware motions. This leads to realistic and coherent motion reconstruction, irrespective of the assumed or learned ground plane. Through extensive evaluation on the AMASS and AIST++ datasets, our model demonstrates good generalization and discriminating abilities in challenging cases including complex and ambiguous human-ground interactions. The code will be released.
翻译:理解复杂的人-地交互对于从RGB视频中准确、真实地重建3D人体运动至关重要,因为这能确保人体与地面平面之间的一致性。先前的方法或以隐式方式、或以稀疏方式建模人-地交互,在面对噪声和不确定性时往往导致不真实、不正确的运动。相比之下,我们的方法以稠密、连续的方式显式表示这些交互。为此,我们提出了一种新颖的面向3D人体运动重建的地面感知运动模型(GraMMaR),该模型在运动序列的每个时间步联合学习每个关节点与地面平面之间的姿态和交互的转移分布。该模型经过训练,明确促进运动与相对于地面的距离变化之间的一致性。训练后,我们建立了一种联合优化策略,利用GraMMaR作为双重先验,将优化过程正则化到合理的地面感知运动空间。无论地面平面是假设的还是学习得到的,这都能产生真实且连贯的运动重建。通过在AMASS和AIST++数据集上的广泛评估,我们的模型在复杂和模糊的人-地交互等具有挑战性的案例中展现出良好的泛化能力和判别能力。代码将公开发布。