Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/.
翻译:尽管基于强化学习的算法已在诸多领域实现超越人类的表现,但机器人领域仍面临重大挑战,因其状态与动作空间均为连续空间,且奖励函数普遍呈现稀疏特性。此外,在多数情况下,智能体无法获得任何形式的示范数据。受人类学习机制启发,本研究提出一种名为高光经验回放(HiER)的方法,该方法为最具关联性的经验数据创建独立的高光回放缓冲区。在权重更新过程中,转移样本同时从标准回放缓冲区与高光回放缓冲区中采样。本方法可独立使用,亦可与 hindsight 经验回放(HER)及 prioritized 经验回放(PER)技术结合使用。通过在三个机器人基准测试的8项任务上进行验证,本方法显著提升了现有最优算法的性能。此外,为充分发挥 HiER 的潜力,我们进一步提出 HiER+ 方案,该方案通过任意数据收集课程学习方法对 HiER 进行增强。我们的实现代码、定性分析结果及视频演示详见项目网站:http://www.danielhorvath.eu/hier/。