Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found at https://dextreme.org/
翻译:摘要:近期研究表明,深度强化学习算法能够在仿真环境中学习复杂机器人行为,尤其在多指灵巧操作领域。然而,由于仿真与现实之间的差距,此类模型向真实世界的迁移面临挑战。本文提出以下技术:a) 训练可在仿人机器人手上执行鲁棒灵巧操作的策略;b) 构建适用于提供操作对象实时可靠状态信息的鲁棒位姿估计器。我们的策略经过训练可适应仿真环境中的广泛条件。实验表明,基于视觉的策略在相同重定向任务上显著优于文献中的最优视觉策略,并与通过运动捕捉系统获取特权状态信息的策略性能相当。本研究再次验证了灵巧操作在多样化硬件与仿真器设置中(以Allegro手与基于GPU的Isaac Gym仿真器为例)实现仿真到现实迁移的可能性,同时为研究者使用常见、经济型机器人手与摄像头达成类似成果开辟了新途径。相关策略演示视频及补充信息(包含实验与示范)可参见https://dextreme.org/