When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.
翻译:智能体何时算作收敛?强化学习问题的标准模型为收敛性提供了直接定义:当智能体在每个环境状态下的行为或性能不再变化时,即视为收敛。然而,当我们将学习问题的关注点从环境状态转向智能体自身状态时,智能体收敛的概念变得远不那么清晰。本文在围绕有界智能体构建的强化学习问题框架下,提出了两种互补的智能体收敛性表述。第一种观点认为,当描述智能体未来行为所需的最少状态数无法再减少时,有界智能体即已收敛。第二种观点认为,仅当智能体的内部状态发生变化时其性能才随之改变,有界智能体才算收敛。我们确立了这两种定义的基本性质,表明它们能够兼容标准设定下对收敛性的典型理解,并证明了关于其本质与关系的若干事实。我们认为,这些视角、定义与分析有助于厘清该领域中的一个核心概念。