We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop. Supplementary materials and videos at https://sites.google.com/view/co-gail-web/home
翻译:我们提出一种从人人协作演示中学习人机协作策略的方法。有效的机器人助手必须学会应对演示中展现的多样化人类行为,并在在线任务执行期间当人类调整其策略时保持鲁棒性。我们的方法通过交互式学习过程联合优化人类策略与机器人策略:人类策略学习从演示中生成多样化且合理的协作行为,而机器人策略则通过估计其人类协作者的未观测潜在策略来学习提供辅助。在二维策略游戏、人机交接任务及多步协作操作任务中,我们的方法在模拟评估和与真实人类操作员实时执行任务时均优于替代方法。补充材料与视频参见https://sites.google.com/view/co-gail-web/home