This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towards better actions. We evaluated our method on a set of classical Reinforcement Learning problems, including safety-critical systems where the agent cannot enter some areas at the risk of triggering catastrophic consequences. In all the proposed environments, our agent proved to be efficient at leveraging those policies to improve the performance of any APE-based Reinforcement Learning algorithm, especially in its first learning stages.
翻译:本文探讨了在强化学习智能体中集成局部引导策略的问题。为此,我们首先展示了如何调整现有算法以适应这一场景,随后提出了一种基于含噪策略切换过程的新算法。该方法建立在恰当的近似策略评估(APE)框架之上,通过提供一种扰动机制,谨慎地引导局部策略趋向更优动作。我们在包括安全关键系统在内的一系列经典强化学习问题上进行了评估——在这些场景中智能体可能因进入某些区域而引发灾难性后果。在所有提出的环境中,我们的智能体在利用这些策略提升基于APE的强化学习算法性能方面展现出高效性,尤其是在学习的初始阶段。