We introduce COMET (Causal Object-centric Model for Efficient Tree search), a model-based reinforcement learning algorithm that performs Monte Carlo Tree Search in a slot-structured latent space. COMET pairs a frozen unsupervised object-centric encoder with a transformer-based world model, in which actions are bound to objects through a novel action-slot fusion mechanism that is used in slot transition prediction. Policy and value heads use object-causal attention, modulating token interactions by learned per-slot relevance scores so that decision-making concentrates on task-relevant entities. COMET adds an explicit object-level inductive bias to MuZero-style latent planning. Across eight visually and dynamically diverse tasks from the Object-Centric Visual RL benchmark, ManiSkill, Robosuite, and VizDoom, COMET achieves a higher mean normalized score during the early stages of training compared to object-centric and monolithic baselines.
翻译:我们提出COMET(因果目标中心高效树搜索模型),一种在槽结构潜空间执行蒙特卡洛树搜索的基于模型的强化学习算法。COMET将冻结的无监督目标中心编码器与基于变换器的世界模型配对,通过新颖的动作-槽融合机制将动作绑定到目标上,该机制用于槽转移预测。策略与价值网络采用目标因果注意力机制,通过学习的每槽相关性分数调节令牌交互,使决策聚焦于任务相关实体。COMET为类MuZero的潜在规划范式引入显式目标层级归纳偏置。在来自目标中心视觉强化学习基准、ManiSkill、Robosuite和VizDoom的八个视觉与动态多样性任务中,相比目标中心基线方法与整体基线方法,COMET在训练早期阶段取得更高平均归一化得分。