Exploration efficiency poses a significant challenge in goal-conditioned reinforcement learning (GCRL) tasks, particularly those with long horizons and sparse rewards. A primary limitation to exploration efficiency is the agent's inability to leverage environmental structural patterns. In this study, we introduce a novel framework, GEASD, designed to capture these patterns through an adaptive skill distribution during the learning process. This distribution optimizes the local entropy of achieved goals within a contextual horizon, enhancing goal-spreading behaviors and facilitating deep exploration in states containing familiar structural patterns. Our experiments reveal marked improvements in exploration efficiency using the adaptive skill distribution compared to a uniform skill distribution. Additionally, the learned skill distribution demonstrates robust generalization capabilities, achieving substantial exploration progress in unseen tasks containing similar local structures.
翻译:探索效率对目标条件强化学习(GCRL)任务构成重大挑战,尤其是那些具有长视界和稀疏奖励的任务。探索效率的一个主要限制因素是智能体无法利用环境结构模式。在本研究中,我们提出了一种新颖框架GEASD,旨在通过学习过程中的自适应技能分布来捕获这些模式。该分布优化了上下文视界内已达成目标的局部熵,从而增强目标扩散行为,并促进在包含熟悉结构模式的状态中进行深度探索。我们的实验表明,与均匀技能分布相比,自适应技能分布显著提升了探索效率。此外,所学技能分布展现出强大的泛化能力,能够在包含相似局部结构的未见任务中实现显著的探索进展。