Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. In this work, we propose: 1) HiER: highlight experience replay that creates a secondary replay buffer for the most relevant experiences, 2) E2H-ISE: an easy2hard data collection curriculum-learning method based on controlling the entropy of the initial state-goal distribution and with it, indirectly, the task difficulty, and 3) HiER+: the combination of HiER and E2H-ISE. They can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). While both HiER and E2H-ISE surpass the baselines, HiER+ further improves the results and significantly outperforms the state-of-the-art on the push, slide, and pick-and-place robotic manipulation tasks. Our implementation and further media materials are available on the project site.
翻译:摘要:尽管基于强化学习的算法在许多领域取得了超人水平的表现,但机器人领域仍面临重大挑战:状态空间和动作空间具有连续性,且奖励函数以稀疏性为主。本文提出:1)HiER:高亮经验回放机制,通过创建辅助回放缓冲区存储最相关经验;2)E2H-ISE:基于初始状态-目标分布熵控制的由易到难数据收集课程学习方法,通过熵间接调节任务难度;3)HiER+:融合HiER与E2H-ISE的联合方法。这些技术可独立或结合事后经验回放(HER)及优先经验回放(PER)使用。实验表明,HiER与E2H-ISE均超越基线方法,而HiER+进一步改善结果,在推动、滑动及抓取放置等机器人操作任务中显著优于现有最优水平。相关实现代码及配套材料详见项目网站。