Modern reinforcement learning has been conditioned by at least three dogmas. The first is the environment spotlight, which refers to our tendency to focus on modeling environments rather than agents. The second is our treatment of learning as finding the solution to a task, rather than adaptation. The third is the reward hypothesis, which states that all goals and purposes can be well thought of as maximization of a reward signal. These three dogmas shape much of what we think of as the science of reinforcement learning. While each of the dogmas have played an important role in developing the field, it is time we bring them to the surface and reflect on whether they belong as basic ingredients of our scientific paradigm. In order to realize the potential of reinforcement learning as a canonical frame for researching intelligent agents, we suggest that it is time we shed dogmas one and two entirely, and embrace a nuanced approach to the third.
翻译:现代强化学习至少受到三条教义的影响。第一条是环境聚光灯效应,指我们倾向于关注环境建模而非智能体建模。第二条是我们将学习视为寻找任务解决方案而非适应过程。第三条是奖励假说,即所有目标和意图均可被理解为对奖励信号的最大化。这三条教义塑造了我们当前对强化学习学科的基本认知。尽管每条教义在该领域发展中都发挥了重要作用,现在正是时候将其置于明面并反思它们是否应作为科学范式的基本要素。为实现强化学习作为智能体研究范式框架的潜力,我们建议彻底摒弃前两条教义,并对第三条采取更精细化的认知路径。