Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field.
翻译:强化学习(RL)是一种通过智能体与环境的交互来训练其最大化累积回报的机器学习方法。近年来,RL与深度学习的融合在诸多极具挑战性的任务中取得了显著成果,包括棋盘游戏、街机游戏以及机器人控制等领域。然而,尽管取得了这些成功,仍存在若干关键挑战,例如由敏感超参数导致的脆弱收敛特性、长时域与稀疏奖励下的时间信用分配难题、探索多样性不足(尤其在连续搜索空间场景中)、多智能体强化学习中的信用分配困难,以及奖励函数的多目标冲突等。进化计算(EC)通过维护一个学习智能体种群,在解决上述局限性方面展现出了良好潜力。本文对将EC整合进RL的最新方法(即进化强化学习,EvoRL)进行了全面综述。我们根据RL的核心研究方向对EvoRL方法进行分类,包括超参数优化、策略搜索、探索、奖励塑造、元强化学习以及多目标强化学习。随后,我们讨论了高效方法、基准测试及可扩展平台等未来研究方向。本综述旨在为关注EvoRL领域的研究人员与实践者提供参考,揭示了未来研究的重要挑战与机遇。借助本综述,研究者与实践者可开发更高效的EvoRL方法及定制化基准测试,从而进一步推动这一前景广阔的交叉研究领域的发展。