Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that enables end-to-end reasoning about people in scenes. Our method, called TRACE, introduces several novel architectural components. Most importantly, it uses two new "maps" to reason about the 3D trajectory of people over time in camera, and world, coordinates. An additional memory unit enables persistent tracking of people even during long occlusions. TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras. By training it end-to-end, and using full image information, TRACE achieves state-of-the-art performance on tracking and HPS benchmarks. The code and dataset are released for research purposes.
翻译:尽管三维人体姿态与形状估计(HPS)技术发展迅速,但现有方法仍难以在全局坐标系中可靠估计运动人体,这对众多实际应用至关重要。当摄像头同时移动时,该问题尤为棘手,因为人体运动与摄像头运动相互纠缠。为解决这些问题,我们采用一种新颖的5D表示方法(空间、时间与身份),实现对场景中人物的端到端推理。所提出的方法TRACE引入了多项创新架构组件。最重要的是,它利用两种新型"映射图"在摄像头坐标系和世界坐标系中推理人体随时间变化的三维轨迹。此外,一个附加的记忆单元能够实现即使在长时间遮挡情况下对人体目标的持续追踪。TRACE是首个从动态摄像头获取的全局坐标系中联合恢复与追踪三维人体的单阶段方法。通过端到端训练并利用完整图像信息,TRACE在追踪和HPS基准测试中达到了最先进性能。代码与数据集已开源供研究使用。