World-Task Factorization for Robot Learning

Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.

翻译：机器人学习必须产生能够泛化到约束条件、队友和环境新组合的策略。为实现这一目标，我们需要对策略进行结构性因子分解，这种选择决定了哪些部分可泛化、哪些需要重新训练、哪些保持纠缠。现有方法涵盖广泛的技术谱系：从期望通过数据扩展自发涌现结构，到通过层次结构、技能库或学习特化来手工设计结构。本文研究我们认为机器人学中最根本的因子分解——将世界与任务分离。我们探究这种分解具有理论依据的条件：世界因子属于具身系统与环境的内在属性，其存在独立于意图；任务因子则由任务逻辑对世界允许状态的界定来定义。我们通过贝叶斯模型证据形式化这种非对称性：该分解与数据生成过程一致，通过解析世界模型保持高似然度，并降低奥卡姆剃刀对任务参数的惩罚。我们通过配对两种机制实例化该分解：AICON——一个可微分的递归估计器与互连组合图，无需任务特定数据即可运行，能将代价梯度传播至执行器；以及调节梯度路径的紧凑学习策略。梯度作为两个因子间的接口：通过图传递世界结构，通过代价传递任务结构，从而在保持结构泛化能力的同时实现低维学习。我们在三个涵盖异构机器人、环境、任务逻辑和感知运动模态的问题上进行世界/任务分解测试。实验表明，本框架在所有设置中均优于端到端基线和解析启发式方法，可零样本泛化到分布外配置，并无需重新训练直接迁移至实际硬件。