Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metric global coordinate frame from egocentric perception. This system is motivated by the complementarity of the two sensors: IMUs provide robustness to occlusions and high-speed motions, while egocentric SLAM anchors long-horizon motion and stabilizes upper body pose. We collect a dataset of agile activities to evaluate RoSHI. On this dataset, we generally outperform other egocentric baselines and perform comparably to a state-of-the-art exocentric baseline (SAM3D). Finally, we demonstrate that the motion data recorded from our system are suitable for real-world humanoid policy learning. For videos, data and more, visit the project webpage: https://roshi-mocap.github.io/
翻译:扩大机器人学习规模很可能需要包含野外丰富、长程交互的人类数据。现有收集此类数据的方法在便携性、对遮挡的鲁棒性和全局一致性之间存在权衡。我们提出RoSHI——一种混合型可穿戴设备,它将低成本稀疏IMU与Project Aria眼镜融合,从自我中心感知中估计穿戴者在度量全局坐标系下的完整三维姿态与体形。该系统设计源于两种传感器的互补性:IMU提供对遮挡和高速运动的鲁棒性,而自我中心SLAM则锚定长程运动并稳定上肢姿态。我们收集了一个包含敏捷活动的数据集以评估RoSHI。在该数据集上,我们普遍优于其他自我中心基线方法,并与当前最先进的外心基线方法(SAM3D)表现相当。最后,我们证明该系统记录的运动数据适用于真实世界的人形机器人策略学习。视频、数据及更多信息请访问项目网页:https://roshi-mocap.github.io/