Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities. In contrast, humans rely on their hippocampus to retrieve relevant information from past experiences of relevant tasks, which guides their decision-making when learning a new task, rather than exclusively depending on environmental interactions. Nevertheless, designing a hippocampus-like module for an agent to incorporate past experiences into established reinforcement learning algorithms presents two challenges. The first challenge involves selecting the most relevant past experiences for the current task, and the second challenge is integrating such experiences into the decision network. To address these challenges, we propose a novel method that utilizes a retrieval network based on task-conditioned hypernetwork, which adapts the retrieval network's parameters depending on the task. At the same time, a dynamic modification mechanism enhances the collaborative efforts between the retrieval and decision networks. We evaluate the proposed method on the MiniGrid environment.The experimental results demonstrate that our proposed method significantly outperforms strong baselines.
翻译:深度强化学习算法通常受限于采样效率低下问题,严重依赖与环境的多次交互才能获得精确的决策能力。相比之下,人类依靠海马体从相关任务的过往经验中检索相关信息,这指导了他们在学习新任务时的决策过程,而非完全依赖环境交互。然而,为智能体设计类似海马体的模块,将过往经验融入既有的强化学习算法,存在两个挑战。第一个挑战涉及为当前任务选择最相关的过往经验,第二个挑战则是如何将这些经验整合到决策网络中。为应对这些挑战,我们提出了一种新颖方法,该方法利用基于任务条件超网络的检索网络,根据具体任务调整检索网络的参数。同时,一种动态修改机制增强了检索网络与决策网络之间的协同作用。我们在MiniGrid环境中对所提方法进行了评估。实验结果表明,我们的方法显著优于强基线方法。