In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.
翻译:在序列决策制定场景中,智能体旨在对大量(可能无限)的环境集合实现系统泛化。此类环境被建模为离散马尔可夫决策过程,其状态和动作均通过特征向量表示。环境的内在结构允许将转移动力学分解为两个组成部分:环境特有部分与共享部分。以一组遵循相同运动定律的环境为例,智能体可从这些环境的子集中进行有限次数的无奖励交互。随后,智能体仅依赖上述交互过程,必须能够近似求解原始集合中任意环境定义的任何规划任务。我们能否设计出实现这一系统泛化宏伟目标的可证明高效算法?本文对此问题给出了部分肯定回答。首先,我们通过引入因果视角给出系统泛化的可处理形式化表述。随后,在特定结构假设下,我们提出一种简单学习算法,该算法能保证任意期望规划误差不超过不可避免的次优项,同时展现出多项式样本复杂度。