Computational accounts of purposeful behavior consist of descriptive and normative aspects. The former enable agents to ascertain the current (or future) state of affairs in the world and the latter to evaluate the desirability, or lack thereof, of these states with respect to the agent's goals. In Reinforcement Learning, the normative aspect (reward and value functions) is assumed to depend on a pre-defined and fixed descriptive one (state representation). Alternatively, these two aspects may emerge interdependently: goals can be, and indeed often are, expressed in terms of state representation features, but they may also serve to shape state representations themselves. Here, we illustrate a novel theoretical framing of state representation learning in bounded agents, coupling descriptive and normative aspects via the notion of goal-directed, or telic, states. We define a new controllability property of telic state representations to characterize the tradeoff between their granularity and the policy complexity capacity required to reach all telic states. We propose an algorithm for learning controllable state representations and demonstrate it using a simple navigation task with changing goals. Our framework highlights the crucial role of deliberate ignorance - knowing what to ignore - for learning state representations that are both goal-flexible and simple. More broadly, our work provides a concrete step towards a unified theoretical view of natural and artificial learning through the lens of goals.
翻译:目的性行为的计算解释包含描述性与规范性两个方面。前者使智能体能够确定世界当前(或未来)的状态,后者则用于评估这些状态相对于智能体目标的可取性(或不可取性)。在强化学习中,规范性方面(奖励函数与价值函数)被假定依赖于预定义且固定的描述性方面(状态表示)。另一种观点认为,这两个方面可能相互依赖地涌现:目标可以(且实际上经常)通过状态表示特征来表达,但目标也可能反过来塑造状态表示本身。本文提出了一种新颖的理论框架,用于描述有限理性智能体的状态表示学习,通过目标导向(或称目的性)状态的概念将描述性与规范性方面耦合起来。我们定义了目的状态表示的一种新的可控性特性,用以刻画其粒度与达到所有目的状态所需策略复杂度容量之间的权衡关系。我们提出了一种学习可控状态表示的算法,并通过一个目标动态变化的简单导航任务进行验证。我们的框架突显了"刻意忽略"——即知晓应当忽略哪些信息——对于学习兼具目标灵活性与简洁性的状态表示所起的关键作用。更广泛而言,这项工作通过目标视角为构建自然学习与人工学习的统一理论框架迈出了坚实的一步。