In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning. Despite the importance of continual reinforcement learning, the community lacks a simple definition of the problem that highlights its commitments and makes its primary concepts precise and clear. To this end, this paper is dedicated to carefully defining the continual reinforcement learning problem. We formalize the notion of agents that "never stop learning" through a new mathematical language for analyzing and cataloging agents. Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents. We provide two motivating examples, illustrating that traditional views of multi-task reinforcement learning and continual supervised learning are special cases of our definition. Collectively, these definitions and perspectives formalize many intuitive concepts at the heart of learning, and open new research pathways surrounding continual learning agents.
翻译:在强化学习问题的标准视角下,智能体的目标是高效识别出能最大化长期收益的策略。然而,这一观点基于将学习视为寻找解决方案的狭隘认知,而非将学习视为永无止境的适应过程。相比之下,持续强化学习则描述了一种最优智能体永不停止学习的场景。尽管持续强化学习具有重要价值,但该领域至今仍缺乏一个简洁的问题定义来阐明其核心承诺并精确厘清其基本概念。为此,本文致力于审慎定义持续强化学习问题。我们通过一种用于分析与分类智能体的新型数学语言,形式化了“永不停止学习”的智能体概念。借助这一新语言,我们将持续学习智能体定义为可被理解为无限执行隐式搜索过程的智能体,而持续强化学习则被定义为最优智能体均为持续学习智能体的场景。我们提出两个启发性示例,表明传统多任务强化学习与持续监督学习的视角均属于本定义的特例。综合而言,这些定义与视角形式化了学习核心中的诸多直观概念,并为围绕持续学习智能体的新研究路径开辟了方向。