We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often not possible for in-the-wild videos. However, even when existing SLAM systems cannot recover accurate scene reconstructions, the background pixel motion still provides enough signal to constrain the camera motion. We show that relative camera estimates along with data-driven human motion priors can resolve the scene scale ambiguity and recover global human trajectories. Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack. We quantify our improvement over existing methods on 3D human dataset Egobody. We further demonstrate that our recovered camera scale allows us to reason about motion of multiple people in a shared coordinate frame, which improves performance of downstream tracking in PoseTrack. Code and video results can be found at https://vye16.github.io/slahmr.
翻译:我们提出了一种从野外视频中重建全局人体轨迹的方法。我们的优化方法将相机运动与人体运动解耦,从而能够将人物置于同一世界坐标系中。现有大多数方法不模拟相机运动;依赖背景像素推断三维人体运动的方法通常需要完整的场景重建,而这对于野外视频通常难以实现。然而,即使现有SLAM系统无法恢复准确的场景重建,背景像素运动仍能提供足够信号来约束相机运动。我们证明,相对相机估计与数据驱动的人体运动先验相结合,能够解决场景尺度模糊性问题并恢复全局人体轨迹。我们的方法能在具有挑战性的野外视频(如PoseTrack)中稳健地恢复人物的全局三维轨迹。我们在三维人体数据集Egobody上量化了相较于现有方法的改进效果。进一步实验表明,恢复的相机尺度使我们能够在共享坐标系中推理多人的运动,从而提升PoseTrack下游跟踪任务的性能。代码和视频结果见https://vye16.github.io/slahmr。