Multi-fingered robotic hands have potential to enable robots to perform sophisticated manipulation tasks. However, teaching a robot to grasp objects with an anthropomorphic hand is an arduous problem due to the high dimensionality of state and action spaces. Deep Reinforcement Learning (DRL) offers techniques to design control policies for this kind of problems without explicit environment or hand modeling. However, state-of-the-art model-free algorithms have proven inefficient for learning such policies. The main problem is that the exploration of the environment is unfeasible for such high-dimensional problems, thus hampering the initial phases of policy optimization. One possibility to address this is to rely on off-line task demonstrations, but, oftentimes, this is too demanding in terms of time and computational resources. To address these problems, we propose the A Grasp Pose is All You Need (G-PAYN) method for the anthropomorphic hand of the iCub humanoid. We develop an approach to automatically collect task demonstrations to initialize the training of the policy. The proposed grasping pipeline starts from a grasp pose generated by an external algorithm, used to initiate the movement. Then a control policy (previously trained with the proposed G-PAYN) is used to reach and grab the object. We deployed the iCub into the MuJoCo simulator and use it to test our approach with objects from the YCB-Video dataset. Results show that G-PAYN outperforms current DRL techniques in the considered setting in terms of success rate and execution time with respect to the baselines. The code to reproduce the experiments is released together with the paper with an open source license.
翻译:多指机器手具备使机器人执行复杂操作任务的潜力。然而,由于状态空间和动作空间的维度较高,教机器人用拟人手抓取物体是一个艰巨的问题。深度强化学习(DRL)提供了为这类问题设计控制策略的技术,无需对环境或手部进行显式建模。然而,现有无模型算法在学习此类策略方面被证明效率低下。主要问题在于:对于此类高维问题,环境探索不可行,从而阻碍了策略优化的初始阶段。解决该问题的一种可行方案是依赖离线任务演示,但这种方法往往在时间和计算资源方面要求过高。为解决这些问题,我们针对iCub仿人机器人的拟人手提出了"抓取姿态即所需"(G-PAYN)方法。我们开发了一种自动收集任务演示的方法来初始化策略训练。所提出的抓取流程从外部算法生成的抓取姿态开始,用于启动运动。随后使用预先通过所提出的G-PAYN方法训练的控制策略来接近并抓取物体。我们将iCub部署到MuJoCo仿真器中,并使用YCB-Video数据集中的物体测试该方法。结果表明,在考虑的场景下,G-PAYN在成功率和执行时间方面均优于当前DRL技术及基线方法。本论文同时以开源许可协议发布了可复现实验的代码。