In this lecture, we present a general perspective on reinforcement learning (RL) objectives, where we show three versions of objectives. The first version is the standard definition of objective in RL literature. Then we extend the standard definition to the $\lambda$-return version, which unifies the standard definition of objective. Finally, we propose a general objective that unifies the previous two versions. The last version provides a high level to understand of RL's objective, where it shows a fundamental formulation that connects some widely used RL techniques (e.g., TD$(\lambda)$ and GAE), and this objective can be potentially applied to extensive RL algorithms.
翻译:在本讲座中,我们提出了强化学习目标的一般视角,展示了三种版本的目标。第一个版本是强化学习文献中目标的经典定义。随后我们将经典定义扩展为$\lambda$-return版本,该版本统一了目标的标准定义。最后,我们提出了一种统一前两个版本的一般性目标。最终版本为理解强化学习目标提供了高层视角,揭示了连接某些广泛使用的强化学习技术(例如TD$(\lambda)$和GAE)的基础性表述,该目标可潜在应用于广泛的强化学习算法中。