Seamless human-robot manipulation in close proximity relies on accurate forecasts of human motion. While there has been significant progress in learning forecast models at scale, when applied to manipulation tasks, these models accrue high errors at critical transition points leading to degradation in downstream planning performance. Our key insight is that instead of predicting the most likely human motion, it is sufficient to produce forecasts that capture how future human motion would affect the cost of a robot's plan. We present ManiCast, a novel framework that learns cost-aware human forecasts and feeds them to a model predictive control planner to execute collaborative manipulation tasks. Our framework enables fluid, real-time interactions between a human and a 7-DoF robot arm across a number of real-world tasks such as reactive stirring, object handovers, and collaborative table setting. We evaluate both the motion forecasts and the end-to-end forecaster-planner system against a range of learned and heuristic baselines while additionally contributing new datasets. We release our code and datasets at https://portal-cornell.github.io/manicast/.
翻译:在近距离的人机协同操作中,准确预测人体运动至关重要。尽管大规模学习预测模型已取得显著进展,但在应用于操作任务时,这些模型在关键过渡点会产生高误差,导致下游规划性能下降。我们的核心洞察是:与其预测最可能的人体运动,不如生成能够捕捉未来人体运动对机器人计划成本影响的预测。我们提出ManiCast——一种新型框架,通过学习成本感知的人体预测,并将其输入模型预测控制规划器以执行协同操作任务。该框架能实现人类与7自由度机械臂在多个真实世界任务中的流畅实时交互,包括动态搅拌、物体交接及协同摆盘等。我们针对运动预测及端到端预测-规划系统,与一系列学习方法和启发式基线进行了对比评估,并额外贡献了新数据集。代码与数据集已开源至https://portal-cornell.github.io/manicast/。