Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, including 6 inertial measurement units (IMUs) and a monocular phone camera. On one hand, inertial mocap suffers from large translation drift due to the lack of the global positioning signal. EgoLocate leverages image-based simultaneous localization and mapping (SLAM) techniques to locate the human in the reconstructed scene. On the other hand, SLAM often fails when the visual feature is poor. EgoLocate involves inertial mocap to provide a strong prior for the camera motion. Experiments show that localization, a key challenge for both two fields, is largely improved by our technique, compared with the state of the art of the two fields. Our codes are available for research at https://xinyu-yi.github.io/EgoLocate/.
翻译:人体感知与环境感知是计算机视觉与图形学中的两大重要课题。人体运动通常通过惯性传感器捕捉,而环境重建多依赖相机。我们通过EgoLocate系统将这两项技术融合,该系统利用稀疏身体佩戴传感器(包括6个惯性测量单元(IMU)和一部单目手机摄像头)实时同步执行人体运动捕捉(mocap)、定位与建图。一方面,惯性运动捕捉因缺乏全局定位信号而产生显著平移漂移。EgoLocate借助基于图像的同步定位与建图(SLAM)技术,在重建场景中定位人体。另一方面,SLAM在视觉特征稀疏时常失效。EgoLocate引入惯性运动捕捉为相机运动提供强先验。实验表明,与这两个领域的现有最优技术相比,本方法显著改进了定位——这一对两个领域均具挑战性的关键问题。我们的研究代码已开源,见 https://xinyu-yi.github.io/EgoLocate/。