Meta Reinforcement Learning (Meta RL) trains agents that adapt to fast-changing environments and tasks. Current strategies often lose adaption efficiency due to the passive nature of model exploration, causing delayed understanding of new transition dynamics. This results in particularly fast-evolving tasks being impossible to solve. We propose a novel approach, Hypothesis Network Planned Exploration (HyPE), that integrates an active and planned exploration process via the hypothesis network to optimize adaptation speed. HyPE uses a generative hypothesis network to form potential models of state transition dynamics, then eliminates incorrect models through strategically devised experiments. Evaluated on a symbolic version of the Alchemy game, HyPE outpaces baseline methods in adaptation speed and model accuracy, validating its potential in enhancing reinforcement learning adaptation in rapidly evolving settings.
翻译:元强化学习训练能够适应快速变化环境和任务的智能体。当前策略常因模型探索的被动性而损失适应效率,导致对新转移动力学的理解延迟,尤其使得快速演变的任务无法求解。我们提出一种新方法——假设网络规划探索,通过假设网络整合主动且规划化的探索过程以优化适应速度。HyPE 利用生成式假设网络构建状态转移动力学的潜在模型,并通过策略性设计的实验消除错误模型。在符号版炼金术游戏上的评估表明,HyPE 在适应速度和模型准确性上超越基线方法,验证了其在快速演化场景中增强强化学习适应的潜力。