Reinforcement learning research obtained significant success and attention with the utilization of deep neural networks to solve problems in high dimensional state or action spaces. While deep reinforcement learning policies are currently being deployed in many different fields from medical applications to self driving vehicles, there are still ongoing questions the field is trying to answer on the generalization capabilities of deep reinforcement learning policies. In this paper, we will outline the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their robustness and generalization capabilities. Furthermore, we will formalize and unify the diverse solution approaches to increase generalization, and overcome overfitting in state-action value functions. We believe our study can provide a compact systematic unified analysis for the current advancements in deep reinforcement learning, and help to construct robust deep neural policies with improved generalization abilities.
翻译:强化学习研究通过利用深度神经网络解决高维状态或动作空间问题,取得了显著成功并获得了广泛关注。尽管深度强化学习策略目前已从医疗应用到自动驾驶汽车等多个领域得到部署,但该领域仍在探索深度强化学习策略泛化能力这一关键问题。本文将从根本原因出发,系统阐述深度强化学习策略为何会遭遇过拟合问题,进而限制其鲁棒性与泛化能力。此外,我们将对提升泛化能力、克服状态-动作值函数过拟合的多种解决方案进行形式化统一。我们相信,本研究能够为当前深度强化学习领域的最新进展提供简洁系统化的统一分析,助力构建具有更强泛化能力的鲁棒深度神经策略。