Seamless human-robot manipulation in close proximity relies on accurate forecasts of human motion. While there has been significant progress in learning forecast models at scale, when applied to manipulation tasks, these models accrue high errors at critical transition points leading to degradation in downstream planning performance. Our key insight is that instead of predicting the most likely human motion, it is sufficient to produce forecasts that capture how future human motion would affect the cost of a robot's plan. We present ManiCast, a novel framework that learns cost-aware human forecasts and feeds them to a model predictive control planner to execute collaborative manipulation tasks. Our framework enables fluid, real-time interactions between a human and a 7-DoF robot arm across a number of real-world tasks such as reactive stirring, object handovers, and collaborative table setting. We evaluate both the motion forecasts and the end-to-end forecaster-planner system against a range of learned and heuristic baselines while additionally contributing new datasets. We release our code and datasets at https://portal-cornell.github.io/manicast/.
翻译:实现人与机器人在近距离下的无缝协作操控依赖于对人类运动的准确预测。尽管大规模学习预测模型已取得显著进展,但将这些模型应用于操控任务时,在关键过渡点会出现高误差,导致下游规划性能下降。我们的核心洞察是:无需预测最可能的人类运动,只需生成能够反映未来人类运动如何影响机器人规划成本的预测即可。我们提出ManiCast,一个新颖的框架,通过学习成本感知的人类预测并将其输入模型预测控制规划器,以执行协作操控任务。该框架支持人类与7自由度机器人手臂在多项真实任务中实现流畅的实时交互,包括反应式搅拌、物体交接及协作摆台。我们针对运动预测及端到端预测器-规划器系统,与一系列学习型及启发式基线方法进行了对比评估,并额外贡献了新的数据集。相关代码与数据集已开源至https://portal-cornell.github.io/manicast/。