Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models and low-level controllers. For example, generative models may predict photorealistic yet physically infeasible frames that confuse low-level policies. Low-level policies may also be sensitive to subtle visual artifacts in generated goal images. This paper addresses these two facets of generalization, providing an interface to effectively "glue together" language-conditioned image or video prediction models with low-level goal-conditioned policies. Our method, Generative Hierarchical Imitation Learning-Glue (GHIL-Glue), filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts. We find in extensive experiments in both simulated and real environments that GHIL-Glue achieves a 25% improvement across several hierarchical models that leverage generative subgoals, achieving a new state-of-the-art on the CALVIN simulation benchmark for policies using observations from a single RGB camera. GHIL-Glue also outperforms other generalist robot policies across 3/4 language-conditioned manipulation tasks testing zero-shot generalization in physical experiments.
翻译:在互联网规模数据上预训练的图像与视频生成模型能够极大提升机器人学习系统的泛化能力。这些模型可作为高层规划器,为低层目标条件策略生成需达成的中间子目标。然而,此类系统的性能往往受限于生成模型与低层控制器之间的接口瓶颈。例如,生成模型可能预测出虽具照片级真实感但物理不可行的帧序列,从而干扰低层策略;低层策略也可能对生成目标图像中的细微视觉伪影极为敏感。本文针对这两个泛化难题,提出一种有效"粘合"语言条件图像/视频预测模型与低层目标条件策略的接口方法。我们提出的生成式层次模仿学习粘合框架(Generative Hierarchical Imitation Learning-Glue, GHIL-Glue)能够过滤无助于任务推进的子目标,并提升目标条件策略对含有有害视觉伪影的生成子目标的鲁棒性。通过在仿真与真实环境中的大量实验发现,GHIL-Glue在使用生成子目标的多种层次模型中实现了25%的性能提升,在仅使用单目RGB相机观测的CALVIN仿真基准测试中创造了新的最优记录。在物理实验中测试零样本泛化能力的四项语言条件操作任务中,GHIL-Glue亦在三项任务上超越了其他通用机器人策略。