Evolutionary Reinforcement Learning (ERL) that applying Evolutionary Algorithms (EAs) to optimize the weight parameters of Deep Neural Network (DNN) based policies has been widely regarded as an alternative to traditional reinforcement learning methods. However, the evaluation of the iteratively generated population usually requires a large amount of computational time and can be prohibitively expensive, which may potentially restrict the applicability of ERL. Surrogate is often used to reduce the computational burden of evaluation in EAs. Unfortunately, in ERL, each individual of policy usually represents millions of weights parameters of DNN. This high-dimensional representation of policy has introduced a great challenge to the application of surrogates into ERL to speed up training. This paper proposes a PE-SAERL Framework to at the first time enable surrogate-assisted evolutionary reinforcement learning via policy embedding (PE). Empirical results on 5 Atari games show that the proposed method can perform more efficiently than the four state-of-the-art algorithms. The training process is accelerated up to 7x on tested games, comparing to its counterpart without the surrogate and PE.
翻译:进化强化学习(ERL)通过应用进化算法(EA)优化基于深度神经网络(DNN)策略的权重参数,已被广泛视为传统强化学习方法的替代方案。然而,对迭代生成的种群进行评估通常需要大量计算时间且成本高昂,这可能限制ERL的适用性。代理模型常被用于降低EA中评估的计算负担。遗憾的是,在ERL中,每个策略个体通常代表DNN的数百万权重参数。这种高维策略表示为将代理模型引入ERL以加速训练带来了巨大挑战。本文首次提出PE-SAERL框架,通过策略嵌入(PE)实现代理辅助的进化强化学习。在5个Atari游戏上的实验结果表明,所提方法比四种最先进算法具有更高效率。与未使用代理模型和策略嵌入的对应方法相比,在测试游戏上的训练过程加速高达7倍。