Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant successes across a wide range of domains, including game AI, autonomous vehicles, robotics, and so on. However, DRL and deep MARL agents are widely known to be sample inefficient that millions of interactions are usually needed even for relatively simple problem settings, thus preventing the wide application and deployment in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how efficiently exploring the environment and collecting informative experiences that could benefit policy learning towards the optimal ones. This problem becomes more challenging in complex environments with sparse rewards, noisy distractions, long horizons, and non-stationary co-learners. In this paper, we conduct a comprehensive survey on existing exploration methods for both single-agent and multi-agent RL. We start the survey by identifying several key challenges to efficient exploration. Beyond the above two main branches, we also include other notable exploration methods with different ideas and techniques. In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks. According to our algorithmic and empirical investigation, we finally summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.
翻译:深度强化学习(DRL)和深度多智能体强化学习(MARL)已在包括游戏AI、自动驾驶、机器人等广泛领域取得显著成功。然而,已知DRL和深度MARL智能体存在样本效率低下的问题,即使针对相对简单的问题设定通常也需要数百万次交互,从而阻碍了其在真实工业场景中的广泛应用与部署。其背后的关键瓶颈之一便是众所周知的探索问题,即如何高效探索环境并收集有助于策略学习趋于最优策略的信息性经验。该问题在具有稀疏奖励、噪声干扰、长时域及非平稳共学智能体的复杂环境中变得更具挑战性。本文对现有单智能体及多智能体强化学习的探索方法进行了全面综述。我们从识别高效探索的关键挑战入手展开综述。除上述两大主要分支外,我们还纳入了其他采用不同思想与技术的值得关注的探索方法。除算法分析外,我们还在一组常用基准测试上对不同DRL探索方法进行了全面且统一的实证比较。基于算法与实证研究,我们最终总结了DRL与深度MARL中探索的开放性问题,并指出了未来若干研究方向。