Modeling multimodal human behavior has been a key barrier to increasing the level of interaction between human and robot, particularly for collaborative tasks. Our key insight is that an effective, learned robot policy used for human-robot collaborative tasks must be able to express a high degree of multimodality, predict actions in a temporally consistent manner, and recognize a wide range of frequencies of human actions in order to seamlessly integrate with a human in the control loop. We present Diffusion Co-policy, a method for planning sequences of actions that synergize well with humans during test time. The co-policy predicts joint human-robot action sequences via a Transformer-based diffusion model, which is trained on a dataset of collaborative human-human demonstrations, and directly executes the robot actions in a receding horizon control framework. We demonstrate in both simulation and real environments that the method outperforms other state-of-art learning methods on the task of human-robot table-carrying with a human in the loop. Moreover, we qualitatively highlight compelling robot behaviors that demonstrate evidence of true human-robot collaboration, including mutual adaptation, shared task understanding, leadership switching, and low levels of wasteful interaction forces arising from dissent.
翻译:建模多模态人类行为一直是提升人机交互水平的关键障碍,尤其是在协作任务中。我们的核心洞察是:用于人机协作任务的有效学习型机器人策略必须能够表达高度多模态性、以时间一致的方式预测动作,并识别人类动作的广泛频率范围,从而与处于控制环路中的人类无缝集成。我们提出扩散联合策略(Diffusion Co-policy),这是一种在测试阶段能与人类良好协同的动作序列规划方法。该联合策略通过基于Transformer的扩散模型预测人机联合动作序列,该模型在人类-人类协作演示数据集上训练,并在滚动时域控制框架中直接执行机器人动作。我们通过仿真和真实环境实验证明,该方法在人机协同搬运桌子任务中优于其他最先进的学习方法。此外,我们定性展示了令人瞩目的机器人行为,这些行为体现了真正人机协作的证据,包括相互适应、共享任务理解、领导权切换,以及由分歧产生的低效无效交互力。