Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)
翻译:涉及接触-丰富交互的灵巧操作任务对基于模型的控制系统和模仿学习算法构成重大挑战。其复杂性源于多指机械手需要动态建立和断开接触、平衡非抓握力以及控制巨大的自由度。强化学习因其通用适用性和自主获取最优操作策略的能力而成为一种有前景的方法。然而,其在现实世界中的应用常因需要生成大量样本、重置环境及获取奖励信号而受阻。本文介绍了一种高效的强化学习系统,用于学习灵巧操作技能以缓解这些挑战。本方法的核心思想是将样本高效强化学习的最新进展与回放缓冲区引导式初始化相结合。这种组合使我们能够利用不同任务或物体的数据作为新任务训练的起点,从而显著提升学习效率。此外,我们通过基于模仿的抓取策略实现学习型重置,并结合学习型奖励函数,完善了现实世界训练循环,从而消除了手动重置和奖励工程的需求。我们通过实验表明,将历史数据作为新任务回放缓冲区初始化的优势——例如在四指机械手上快速掌握现实世界中复杂的操作技能。(视频:https://sites.google.com/view/reboot-dexterous)