Deep Reinforcement Learning (DRL) has proven effective in learning control policies using robotic grippers, but much less practical for solving the problem of grasping with dexterous hands -- especially on real robotic platforms -- due to the high dimensionality of the problem. In this work, we focus on the multi-fingered grasping task with the anthropomorphic hand of the iCub humanoid. We propose the RESidual learning with PREtrained CriTics (RESPRECT) method that, starting from a policy pre-trained on a large set of objects, can learn a residual policy to grasp a novel object in a fraction ($\sim 5 \times$ faster) of the timesteps required to train a policy from scratch, without requiring any task demonstration. To our knowledge, this is the first Residual Reinforcement Learning (RRL) approach that learns a residual policy on top of another policy pre-trained with DRL. We exploit some components of the pre-trained policy during residual learning that further speed-up the training. We benchmark our results in the iCub simulated environment, and we show that RESPRECT can be effectively used to learn a multi-fingered grasping policy on the real iCub robot. The code to reproduce the experiments is released together with the paper with an open source license.
翻译:深度强化学习已被证明在使用机器人夹爪学习控制策略方面是有效的,但由于问题的高维特性,在解决灵巧手抓取问题(尤其是在真实机器人平台上)时实用性较低。本文聚焦于 iCub 类人机器人的仿人手多指抓取任务,提出一种结合预训练评判器的残差学习方法(RESPRECT)。该方法从在大规模物体集上预训练的策略出发,无需任何任务演示,即可学习残差策略来抓取新物体,所需时间仅为从头训练策略所需时间步的一部分(约 5 倍加速)。据我们所知,这是首个在深度强化学习预训练策略之上学习残差策略的残差强化学习方法。我们在残差学习过程中利用了预训练策略的某些组件,进一步加速了训练。在 iCub 模拟环境中的基准测试表明,RESPRECT 可有效用于真实 iCub 机器人上的多指抓取策略学习。复现实验的代码随论文一同以开源许可证发布。