Multi-fingered robotic hands could enable robots to perform sophisticated manipulation tasks. However, teaching a robot to grasp objects with an anthropomorphic hand is an arduous problem due to the high dimensionality of state and action spaces. Deep Reinforcement Learning (DRL) offers techniques to design control policies for this kind of problems without explicit environment or hand modeling. However, training these policies with state-of-the-art model-free algorithms is greatly challenging for multi-fingered hands. The main problem is that an efficient exploration of the environment is not possible for such high-dimensional problems, thus causing issues in the initial phases of policy optimization. One possibility to address this is to rely on off-line task demonstrations. However, oftentimes this is incredibly demanding in terms of time and computational resources. In this work, we overcome these requirements and propose the A Grasp Pose is All You Need (G-PAYN) method for the anthropomorphic hand of the iCub humanoid. We develop an approach to automatically collect task demonstrations to initialize the training of the policy. The proposed grasping pipeline starts from a grasp pose generated by an external algorithm, used to initiate the movement. Then a control policy (previously trained with the proposed G-PAYN) is used to reach and grab the object. We deployed the iCub into the MuJoCo simulator and use it to test our approach with objects from the YCB-Video dataset. The results show that G-PAYN outperforms current DRL techniques in the considered setting, in terms of success rate and execution time with respect to the baselines. The code to reproduce the experiments will be released upon acceptance.
翻译:多指机器人手使机器人能够执行复杂的操作任务。然而,由于状态和动作空间的高维性,教机器人用拟人化手抓取物体是一个艰巨的问题。深度强化学习(DRL)提供了在不显式建模环境或手部的情况下设计此类问题控制策略的技术。然而,利用最先进的无模型算法训练这些策略对多指手而言极具挑战性。主要问题在于:对于此类高维问题,无法实现高效的环境探索,从而在策略优化初始阶段引发问题。一种解决方法是依赖离线任务演示,但这往往需要大量时间和计算资源。本文克服了这些要求,提出了一种名为“抓取姿态足矣”(G-PAYN)的方法,用于iCub仿人机器人的拟人化手。我们开发了一种自动收集任务演示的方法,以初始化策略训练。所提出的抓取流程从外部算法生成的抓取姿态开始,用于启动运动。随后,利用先前通过G-PAYN训练的控制策略接近并抓取物体。我们将iCub部署在MuJoCo仿真环境中,并使用YCB-Video数据集中的物体测试该方法。结果表明,在目标场景下,G-PAYN在成功率与执行时间上均优于当前DRL技术。实验复现代码将在论文接收后发布。