Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field.
翻译:强化学习(RL)是一种机器学习方法,通过与环境交互来训练智能体最大化累积奖励。近年来,RL与深度学习的融合在围棋、街机游戏和机器人控制等具有挑战性的任务中取得了显著成就。尽管取得这些成功,RL仍面临若干关键挑战,包括由敏感超参数导致的脆弱收敛特性、长时域和稀疏奖励下的时序信用分配困难、连续搜索空间中缺乏多样化探索、多智能体强化学习中的信用分配难题,以及奖励目标的冲突。进化计算(EC)通过维护学习智能体种群,在应对这些局限性方面展现了巨大潜力。本文全面综述了将EC融入RL的前沿方法(即进化强化学习,EvoRL)。我们根据RL的关键研究领域对EvoRL方法进行分类,包括超参数优化、策略搜索、探索机制、奖励塑造、元强化学习和多目标强化学习。随后探讨了高效方法、基准测试和可扩展平台等未来研究方向。本文为关注EvoRL领域的研究人员和实践者提供了重要资源,揭示了未来研究的关键挑战与机遇。借助本综述,研究者和实践者可开发更高效的EvoRL方法与定制化基准,进一步推动这一极具前景的交叉研究领域的发展。