Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3. The framework utilises a multi-buffer system instead of using a single shared replay buffer. The multi-buffer system allows for the Evolutionary Strategy to search freely in the search space of policies, without running the risk of overpopulating the replay buffer with poorly performing trajectories which limit the number of desirable policy behaviour examples thus negatively impacting the potential of the Deep Reinforcement Learning within the shared framework. The proposed algorithm is demonstrated to perform competitively with current Evolutionary Reinforcement Learning algorithms on MuJoCo control tasks, outperforming the well known state-of-the-art CEM-RL on 3 of the 4 environments tested.
翻译:进化算法与深度强化学习已在众多领域的控制问题中成功应用。近年来,研究者提出了结合这两种方法的算法,旨在发挥各自优势并弥补彼此不足。本文提出了一种新型进化强化学习模型,该模型将特定进化算法子类——进化策略与离策略深度强化学习算法TD3相结合。该框架采用多缓冲区系统替代传统的单一共享经验回放缓冲区。多缓冲区系统允许进化策略在策略搜索空间中自由探索,同时避免因劣质轨迹过度填充经验回放缓冲区而减少优质策略行为样本数量,从而避免对共享框架内深度强化学习潜力产生负面影响。实验表明,所提算法在MuJoCo控制任务中与当前进化强化学习算法相比具有竞争力,在4个测试环境中超过3个环境上优于公认的先进算法CEM-RL。