Tool use, a hallmark feature of human intelligence, remains a challenging problem in robotics due the complex contacts and high-dimensional action space. In this work, we present a novel method to enable reinforcement learning of tool use behaviors. Our approach provides a scalable way to learn the operation of tools in a new category using only a single demonstration. To this end, we propose a new method for generalizing grasping configurations of multi-fingered robotic hands to novel objects. This is used to guide the policy search via favorable initializations and a shaped reward signal. The learned policies solve complex tool use tasks and generalize to unseen tools at test time. Visualizations and videos of the trained policies are available at https://maltemosbach.github.io/generalizable_tool_use.
翻译:工具使用作为人类智能的标志性特征,在机器人领域仍是一项具有挑战性的问题,其原因在于复杂的接触关系和高维动作空间。本文提出了一种新颖的方法,使得强化学习能够实现工具使用行为的学习。我们的方法提供了一种可扩展的方式,仅需一次演示即可学习新类别中工具的操作。为此,我们提出了一种新方法,用于将多指机器人手的抓取配置泛化到新物体。该方法通过有利的初始化条件和成形奖励信号来引导策略搜索。学习所得的策略能够解决复杂的工具使用任务,并在测试阶段泛化到未见过的工具。训练策略的可视化图及视频可在 https://maltemosbach.github.io/generalizable_tool_use 获取。