Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pretrained skills are applied in goal exploration. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance. The source code is available at: https://github.com/GEAPS/GEAPS.
翻译:强化学习(RL)在复杂环境中完成稀疏奖励长周期任务时往往面临挑战。目标条件强化学习(GCRL)通过设计由易到难的子目标课程来解决这一困难问题。在GCRL中,探索新颖子目标对于智能体最终找到通往期望目标的路径至关重要。如何高效探索新颖子目标是GCRL中最具挑战性的问题之一。已有多种目标探索方法被提出,但仍难以高效地找到期望目标。本文提出一种新型学习目标,通过优化已达成目标与待探索新目标的熵,实现基于子目标选择的GCRL中更高效的目标探索。为优化该目标,我们首先通过技能学习,探索并利用与当前任务相似环境中频繁出现的目标转移模式来组合技能。随后,将预训练技能应用于目标探索过程。在多个稀疏奖励长周期基准任务上的评估表明,将本方法融入多种最先进的GCRL基线模型中,可在保持或提升其性能的同时显著增强探索效率。源代码获取地址:https://github.com/GEAPS/GEAPS。