Exploration is fundamental to reinforcement learning (RL), as it determines how effectively an agent discovers and exploits the underlying structure of its environment to achieve optimal performance. Existing exploration methods generally fall into two categories: active exploration and passive exploration. The former introduces stochasticity into the policy but struggles in high-dimensional environments, while the latter adaptively prioritizes transitions in the replay buffer to enhance exploration, yet remains constrained by limited sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences through transition models. MoGE is composed of two components: (1) a diffusion-based generator that synthesizes critical states under the guidance of a utility function evaluating each state's potential influence on policy exploration, and (2) a one-step imagination world model for constructing critical transitions based on the critical states for agent learning. Our method adopts a modular formulation that aligns with the principles of off-policy learning, allowing seamless integration with existing algorithms to improve exploration without altering their core structures. Empirical results on OpenAI Gym and DeepMind Control Suite reveal that MoGE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks.
翻译:探索是强化学习(RL)的基础,它决定了智能体如何有效发现并利用环境的内在结构以实现最优性能。现有探索方法主要分为两类:主动探索与被动探索。前者在策略中引入随机性,但在高维环境中表现不佳;后者通过自适应地优先处理经验回放缓冲区中的转移样本以增强探索,但仍受限于样本多样性的不足。为克服被动探索的局限性,我们提出模型生成式探索(MoGE),该方法通过生成未充分探索的关键状态,并利用转移模型合成动态一致的经验来增强探索。MoGE包含两个组成部分:(1)基于扩散的生成器,在评估各状态对策略探索潜在影响的效用函数引导下合成关键状态;(2)基于一步想象的世界模型,用于根据关键状态构建关键转移样本以供智能体学习。我们的方法采用模块化设计,符合离策略学习原则,无需改变现有算法的核心结构即可无缝集成以提升探索能力。在OpenAI Gym和DeepMind Control Suite上的实验结果表明,MoGE有效衔接了探索与策略学习,在复杂控制任务中显著提升了样本效率和性能。