Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.
翻译:机器人学习必须产生能够泛化到约束条件、队友和环境新组合的策略。为实现这一目标,我们需要对策略进行结构性因子分解,这种选择决定了哪些部分可泛化、哪些需要重新训练、哪些保持纠缠。现有方法涵盖广泛的技术谱系:从期望通过数据扩展自发涌现结构,到通过层次结构、技能库或学习特化来手工设计结构。本文研究我们认为机器人学中最根本的因子分解——将世界与任务分离。我们探究这种分解具有理论依据的条件:世界因子属于具身系统与环境的内在属性,其存在独立于意图;任务因子则由任务逻辑对世界允许状态的界定来定义。我们通过贝叶斯模型证据形式化这种非对称性:该分解与数据生成过程一致,通过解析世界模型保持高似然度,并降低奥卡姆剃刀对任务参数的惩罚。我们通过配对两种机制实例化该分解:AICON——一个可微分的递归估计器与互连组合图,无需任务特定数据即可运行,能将代价梯度传播至执行器;以及调节梯度路径的紧凑学习策略。梯度作为两个因子间的接口:通过图传递世界结构,通过代价传递任务结构,从而在保持结构泛化能力的同时实现低维学习。我们在三个涵盖异构机器人、环境、任务逻辑和感知运动模态的问题上进行世界/任务分解测试。实验表明,本框架在所有设置中均优于端到端基线和解析启发式方法,可零样本泛化到分布外配置,并无需重新训练直接迁移至实际硬件。