Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration efficiency. However, when adapting this approach to address constrained problems, balancing the trade-off between the reward and constraint violation is hard. In this paper, we propose a novel evolutionary constrained reinforcement learning (ECRL) algorithm, which adaptively balances the reward and constraint violation with stochastic ranking, and at the same time, restricts the policy's behaviour by maintaining a set of Lagrange relaxation coefficients with a constraint buffer. Extensive experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms. Ablation analysis shows the benefits of introducing stochastic ranking and constraint buffer.
翻译:进化算法已被用于演化一组行动者,以生成多样化的经验来训练强化学习智能体,这有助于解决时序信用分配问题并提高探索效率。然而,当将这种方法应用于约束问题时,平衡奖励与约束违反之间的权衡是困难的。本文提出了一种新颖的进化约束强化学习(ECRL)算法,该算法通过随机排序自适应地平衡奖励与约束违反,同时通过维护一个约束缓冲区中的拉格朗日松弛系数集合来限制策略行为。在机器人控制基准测试上的大量实验表明,我们的ECRL算法相比现有最先进算法取得了卓越性能。消融分析显示了引入随机排序和约束缓冲区的益处。