We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture a device wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve hand estimation: the resulting kinematic and temporal constraints can reduce world-frame errors in single-frame estimates by 40%. Project page: https://egoallo.github.io/
翻译:我们提出EgoAllo系统,该系统通过头戴式设备实现人体运动估计。仅利用自我中心SLAM位姿与图像数据,EgoAllo通过引导条件扩散模型的采样过程,在场景的全局坐标系中估计穿戴者的三维身体姿态、身高及手部运动参数。实现此功能的核心在于表征方式的创新:我们提出提升模型性能的空间与时间不变性准则,据此推导出头部运动条件参数化方法,将估计精度提升达18%。同时我们证明,系统估计的身体姿态可优化手部运动估计:由此产生的运动学约束与时间约束能将单帧估计的世界坐标系误差降低40%。项目页面:https://egoallo.github.io/