An introduction to reinforcement learning for neuroscience

Reinforcement learning has a rich history in neuroscience, from early work on dopamine as a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to recent work suggesting that dopamine could implement a form of 'distributional reinforcement learning' popularized in deep learning (Dabney et al., 2020). Throughout this literature, there has been a tight link between theoretical advances in reinforcement learning and neuroscientific experiments and findings. As a result, the theories describing our experimental data have become increasingly complex and difficult to navigate. In this review, we cover the basic theory underlying classical work in reinforcement learning and build up to an introductory overview of methods used in modern deep reinforcement learning that have found applications in systems neuroscience. We start with an overview of the reinforcement learning problem and classical temporal difference algorithms, followed by a discussion of 'model-free' and 'model-based' reinforcement learning together with methods such as DYNA and successor representations that fall in between these two categories. Throughout these sections, we highlight the close parallels between the machine learning methods and related work in both experimental and theoretical neuroscience. We then provide an introduction to deep reinforcement learning with examples of how these methods have been used to model different learning phenomena in the systems neuroscience literature, such as meta-reinforcement learning (Wang et al., 2018) and distributional reinforcement learning (Dabney et al., 2020). Code that implements the methods discussed in this work and generates the figures is also provided.

翻译：强化学习在神经科学领域有着丰富的历史，从早期将多巴胺视为时间差分学习的奖赏预测误差信号的研究（Schultz et al., 1997），到近期表明多巴胺可实现在深度学习中流行的"分布强化学习"形式的工作（Dabney et al., 2020）。纵观这些文献，强化学习领域的理论进展与神经科学实验及发现之间始终保持着紧密联系。因此，描述实验数据的理论体系日益复杂且难以把握。本综述从强化学习经典工作的基本理论讲起，逐步介绍现代深度强化学习中已在系统神经科学领域得到应用的方法。我们首先概述强化学习问题与经典时间差分算法，接着讨论"无模型"与"基于模型"的强化学习，以及DYNA和继承表征等介于两者之间的方法。在这些章节中，我们着重强调机器学习方法与实验及理论神经科学相关研究之间的紧密对应关系。随后，我们引入深度强化学习，并通过具体实例说明这些方法如何被用于模拟系统神经科学文献中的不同学习现象，例如元强化学习（Wang et al., 2018）和分布强化学习（Dabney et al., 2020）。我们还提供了实现文中所述方法并生成图示的代码。