We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture the wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve the hands: the resulting kinematic and temporal constraints result in over 40% lower hand estimation errors compared to noisy monocular estimates. Project page: https://egoallo.github.io/
翻译:我们提出了EgoAllo系统,该系统可通过头戴式设备实现人体运动估计。仅利用自我中心SLAM位姿与图像数据,EgoAllo通过引导条件扩散模型的采样过程,在场景的全局坐标系中估计使用者的三维身体姿态、身高及手部动作参数。实现该功能的核心在于表征方法的创新:我们提出了提升模型性能的空间与时间不变性准则,并据此推导出头部运动条件参数化方法,将估计精度提升达18%。研究还表明,系统估计的身体姿态可优化手部估计:由此产生的运动学与时间约束使手部估计误差相比噪声干扰的单目估计降低超过40%。项目页面:https://egoallo.github.io/