Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Learning long-horizon manipulation tasks, however, is a long-standing challenge, and demands decomposing the overarching task into several manageable subtasks to facilitate policy learning and generalization to unseen tasks. Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks. To address these shortcomings, we propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation using pre-trained visual representations designed for robotic control. At a high level, UVD discovers subgoals by detecting phase shifts in the embedding space of the pre-trained representation. Operating purely on visual demonstrations without auxiliary information, UVD can effectively extract visual subgoals embedded in the videos, while incurring zero additional training cost on top of standard visuomotor policy training. Goal-conditioned policies learned with UVD-discovered subgoals exhibit significantly improved compositional generalization at test time to unseen tasks. Furthermore, UVD-discovered subgoals can be used to construct goal-based reward shaping that jump-starts temporally extended exploration for reinforcement learning. We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings on in-domain and out-of-domain task sequences alike, validating the clear advantage of automated visual task decomposition within the simple, compact UVD framework.
翻译:现实世界中的机器人任务往往跨越较长的时域并包含多个阶段。然而,学习长时域操作任务一直是一个长期挑战,需要将整体任务分解为若干个可管理的子任务,以促进策略学习并泛化至未见过的任务。先前的任务分解方法需要特定任务知识、计算密集,且难以直接应用于新任务。为解决这些问题,我们提出了通用视觉分解器(UVD),这是一种即插即用的视觉长时域操作任务分解方法,利用为机器人控制设计的预训练视觉表征。在高层次上,UVD通过检测预训练表征嵌入空间中的相位转变来发现子目标。该方法完全基于视觉演示运行,无需辅助信息,即可有效提取视频中嵌入的视觉子目标,同时在标准视觉运动策略训练基础上不增加额外训练成本。使用UVD发现的子目标训练的目标条件策略,在测试时的未见任务上展现出显著增强的组合泛化能力。此外,UVD发现的子目标还可用于构建基于目标的奖励塑形,从而为强化学习启动时间延伸的探索。我们在仿真和真实世界任务上对UVD进行了广泛评估,在所有案例中,UVD在模仿学习和强化学习设置下均显著优于基线方法,无论是在域内还是域外任务序列上,这充分验证了在简洁紧凑的UVD框架中实现自动化视觉任务分解的明显优势。