Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that enables end-to-end reasoning about people in scenes. Our method, called TRACE, introduces several novel architectural components. Most importantly, it uses two new "maps" to reason about the 3D trajectory of people over time in camera, and world, coordinates. An additional memory unit enables persistent tracking of people even during long occlusions. TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras. By training it end-to-end, and using full image information, TRACE achieves state-of-the-art performance on tracking and HPS benchmarks. The code and dataset are released for research purposes.
翻译:尽管三维人体姿态与形状(HPS)估计技术发展迅速,但现有方法仍无法可靠地估计全局坐标系下的运动人体——这一功能对许多应用至关重要。当摄像机同时运动时,人体运动与摄像机运动相互纠缠,使得问题更具挑战性。为解决这些问题,我们采用了一种新颖的5D表征(空间、时间与身份),能够对场景中的人物进行端到端推理。所提方法TRACE引入了多项新型架构组件:关键创新在于使用两种新型"映射图",分别从摄像机坐标系和世界坐标系推理人体三维轨迹随时间变化;附加记忆单元则支持即便在长时间遮挡下也能持续追踪人物。TRACE是首个从动态摄像机全局坐标系中联合恢复与追踪三维人体的一阶段方法。通过端到端训练并充分利用完整图像信息,TRACE在追踪和HPS基准测试中达到了最优性能。代码与数据集已开源用于学术研究。