Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms due to the inefficiency in collecting successful experiences. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories and replacing the desired goal with one of the achieved states so that any failed trajectory can be utilized as a contribution to learning. However, HER uniformly chooses failed trajectories, without taking into account which ones might be the most valuable for learning. In this paper, we address this problem and propose a novel approach Contact Energy Based Prioritization~(CEBP) to select the samples from the replay buffer based on rich information due to contact, leveraging the touch sensors in the gripper of the robot and object displacement. Our prioritization scheme favors sampling of contact-rich experiences, which are arguably the ones providing the largest amount of information. We evaluate our proposed approach on various sparse reward robotic tasks and compare them with the state-of-the-art methods. We show that our method surpasses or performs on par with those methods on robot manipulation tasks. Finally, we deploy the trained policy from our method to a real Franka robot for a pick-and-place task. We observe that the robot can solve the task successfully. The videos and code are publicly available at: https://erdiphd.github.io/HER_force
翻译:多目标机器人操作任务由于稀疏奖励导致强化学习算法难以有效收集成功经验。近期算法如后见经验回放通过利用失败轨迹,将期望目标替换为已实现状态,使任何失败轨迹均可贡献于学习过程,从而加速训练。然而,后见经验回放均匀选取失败轨迹,未考虑哪些轨迹对学习最具价值。针对此问题,本文提出一种新颖方法——基于接触能量的优先排级(CEBP),利用机器人夹爪触觉传感器与物体位移信息,依据接触产生的丰富信息从回放缓冲区选择样本。我们的优先排级机制倾向于采样含丰富接触经验的轨迹——这类轨迹通常蕴含最大信息量。我们在多种稀疏奖励机器人任务上评估所提方法,并与现有最优方法进行对比。实验表明,本方法在机器人操作任务上超越或持平现有方法。最终,我们将训练所得策略部署至真实Franka机器人执行抓取放置任务,观察到机器人可成功完成任务。相关视频与代码开源地址:https://erdiphd.github.io/HER_force