Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR. However, naively learning a mapping between egocentric videos and human motions is challenging, because the user's body is often unobserved by the front-facing camera placed on the head of the user. In addition, collecting large-scale, high-quality datasets with paired egocentric videos and 3D human motions requires accurate motion capture devices, which often limit the variety of scenes in the videos to lab-like environments. To eliminate the need for paired egocentric video and human motions, we propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation. EgoEgo first integrates SLAM and a learning approach to estimate accurate head motion. Subsequently, leveraging the estimated head pose as input, EgoEgo utilizes conditional diffusion to generate multiple plausible full-body motions. This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion, enabling us to leverage large-scale egocentric video datasets and motion capture datasets separately. Moreover, for systematic benchmarking, we develop a synthetic dataset, AMASS-Replica-Ego-Syn (ARES), with paired egocentric videos and human motion. On both ARES and real data, our EgoEgo model performs significantly better than the current state-of-the-art methods.
翻译:从自我中心视频序列中估计3D人体运动在人类行为理解中扮演关键角色,并广泛应用于VR/AR领域。然而,简单学习自我中心视频与人体运动之间的映射关系极具挑战性,因为用户的身体通常未被置于头部的正面摄像头所观测。此外,收集包含配对自我中心视频与3D人体运动的大规模高质量数据集需要精密的运动捕捉设备,这往往将视频场景限制在类似实验室的环境中。为消除对配对自我中心视频与人体运动的需求,我们提出新方法EgoEgo(基于自头部姿态估计的自身身体姿态估计),该方法将问题分解为两个阶段,通过头部运动作为中间表征进行连接。EgoEgo首先融合SLAM与学习方法估计精确的头部运动,随后以估计的头部姿态为输入,利用条件扩散生成多个合理的全身运动。这种头部与身体姿态的解耦消除了对训练数据集中配对自我中心视频与3D人体运动的需求,使我们能够分别利用大规模自我中心视频数据集和运动捕捉数据集。此外,为进行系统性基准测试,我们构建了包含配对自我中心视频与人体运动的合成数据集ARES(AMASS-Replica-Ego-Syn)。在ARES与真实数据上,我们的EgoEgo模型均显著优于当前最先进方法。