Evolutionary reinforcement learning (ERL) algorithms recently raise attention in tackling complex reinforcement learning (RL) problems due to high parallelism, while they are prone to insufficient exploration or model collapse without carefully tuning hyperparameters (aka meta-parameters). In the paper, we propose a general meta ERL framework via bilevel optimization (BiERL) to jointly update hyperparameters in parallel to training the ERL model within a single agent, which relieves the need for prior domain knowledge or costly optimization procedure before model deployment. We design an elegant meta-level architecture that embeds the inner-level's evolving experience into an informative population representation and introduce a simple and feasible evaluation of the meta-level fitness function to facilitate learning efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to verify that as a general framework, BiERL outperforms various baselines and consistently improves the learning performance for a diversity of ERL algorithms.
翻译:进化强化学习算法因其高度并行性,近年来在解决复杂强化学习问题中引起关注,但若不仔细调整超参数(亦称元参数),容易导致探索不足或模型崩溃。本文提出一种基于双层优化的通用元进化强化学习框架(BiERL),在单智能体框架内并行训练强化学习模型的同时联合更新超参数,从而消除模型部署前对先验领域知识或昂贵优化流程的需求。我们设计了一种精巧的元级架构,将内层进化经验嵌入信息丰富的种群表示,并提出一种简单可行的元级适应度函数评估方法,以提升学习效率。通过在MuJoCo和Box2D任务上开展的大量实验表明,作为通用框架,BiERL优于多种基线方法,并能持续提升多种进化强化学习算法的学习性能。