In this paper, we present a brief survey of Reinforcement Learning (RL), with particular emphasis on Stochastic Approximation (SA) as a unifying theme. The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Stochastic Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.
翻译:本文对强化学习进行了简要综述,特别强调随机近似作为统一主题。本文涵盖马尔可夫奖励过程、马尔可夫决策过程、随机近似算法以及诸如时序差分学习和$Q$-学习等广泛应用的算法。