In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.
翻译:在序贯决策设定中,智能体旨在对大规模(可能无限)的环境集合实现系统性泛化。此类环境被建模为离散马尔可夫决策过程,其中状态和动作均通过特征向量表示。环境的内在结构允许将转移动态分解为两个组成部分:环境特异性组件与共享组件。以共享运动定律的环境集合为例,在此设定下,智能体可通过有限次无奖励交互从该子集获取经验。随后,智能体必须仅依赖上述交互,能够近似求解原始集合中任意环境上的任何规划任务。我们能否设计一种可证明高效的算法,以实现系统性泛化这一宏伟目标?本文对此问题给出了部分肯定回答。首先,我们通过引入因果视角,提出一种易于处理的系统性泛化形式化表述。随后,在特定结构假设下,我们提出一种简单的学习算法,该算法在保证多项式样本复杂度的同时,能够将任意期望的规划误差控制到不可避免的次优项以内。