Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field.
翻译:强化学习(RL)是一种机器学习方法,通过智能体与环境的交互训练其最大化累积奖励。近年来,强化学习与深度学习的结合在围棋、街机游戏及机器人控制等一系列具有挑战性的任务中取得了令人瞩目的成果。尽管取得这些成功,仍然存在若干关键挑战,包括由敏感超参数导致的脆弱收敛特性、长时域与稀疏奖励下的时间信用分配困难、缺乏多样化探索(尤其在连续搜索空间场景中)、多智能体强化学习中的信用分配难题以及奖励目标的冲突性。进化计算(EC)通过维持一个学习智能体种群,在解决这些局限性方面展现出良好潜力。本文对将进化计算融入强化学习的最新技术——即进化强化学习(EvoRL)——进行了全面综述。我们依据强化学习的核心研究领域对EvoRL方法进行分类,包括超参数优化、策略搜索、探索、奖励塑形、元强化学习以及多目标强化学习。随后,我们从高效方法、基准测试和可扩展平台等方面探讨未来研究方向。本综述可为关注EvoRL领域的研究者和实践者提供参考资源,突出未来研究的重要挑战与机遇。借助本综述,研究者和实践者能够为EvoRL开发更高效的方法与定制化的基准测试,进一步推动这一前景广阔的交叉学科研究领域发展。