Deep Reinforcement Learning, a textbook

from arxiv, Revised version 2023, added description of Monte Carlo sampling and N-step algorithm, improved explanation of on-policy and off-policy learning. Preprint available by permission of Publisher

Deep reinforcement learning has gathered much attention recently. Impressive results were achieved in activities as diverse as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to solve difficult problems. They have learned to fly model helicopters and perform aerobatic manoeuvers such as loops and rolls. In some applications they have even become better than the best humans, such as in Atari, Go, poker and StarCraft. The way in which deep reinforcement learning explores complex environments reminds us of how children learn, by playfully trying out things, getting feedback, and trying again. The computer seems to truly possess aspects of human learning; this goes to the heart of the dream of artificial intelligence. The successes in research have not gone unnoticed by educators, and universities have started to offer courses on the subject. The aim of this book is to provide a comprehensive overview of the field of deep reinforcement learning. The book is written for graduate students of artificial intelligence, and for researchers and practitioners who wish to better understand deep reinforcement learning methods and their challenges. We assume an undergraduate-level of understanding of computer science and artificial intelligence; the programming language of this book is Python. We describe the foundations, the algorithms and the applications of deep reinforcement learning. We cover the established model-free and model-based methods that form the basis of the field. Developments go quickly, and we also cover advanced topics: deep multi-agent reinforcement learning, deep hierarchical reinforcement learning, and deep meta learning.

翻译：深度强化学习近年来备受关注。在自动驾驶、游戏博弈、分子重组和机器人技术等多样化领域均取得了令人瞩目的成果。计算机程序在这些领域自行学会了解决复杂问题——它们掌握了遥控模型直升机飞行以及完成筋斗、横滚等特技动作的能力。在某些应用中，计算机甚至超越了最顶尖的人类专家，例如在雅达利游戏、围棋、扑克和星际争霸中。深度强化学习探索复杂环境的方式令人联想到儿童的学习过程：通过尝试性探索、获取反馈、反复试错来学习。计算机似乎真正具备了人类学习的某些特质，这触及了人工智能梦想的核心。教育界同样关注到这一研究成就，各大学已开始开设相关课程。本书旨在全面概述深度强化学习领域，面向人工智能专业研究生、研究人员及希望深入理解深度强化学习方法及其挑战的实践者。读者需具备计算机科学和人工智能的本科基础知识，本书编程语言选用Python。我们将系统阐述深度强化学习的基础理论、核心算法与典型应用，涵盖作为该领域基石的经典无模型与基于模型方法。鉴于领域发展快速，本书还涵盖前沿专题：深度多智能体强化学习、深度分层强化学习与深度元学习。