GHIL-Glue：基于滤波子目标图像的层次化控制 (GHIL-Glue: Hierarchical Control with Filtered Subgoal Images)

Kyle B. Hatch,Ashwin Balakrishna,Oier Mees,Suraj Nair,Seohong Park,Blake Wulfe,Masha Itkina,Benjamin Eysenbach,Sergey Levine,Thomas Kollar,Benjamin Burchfiel

from arxiv, Code, model checkpoints and videos can be found at https://ghil-glue.github.io

Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models and low-level controllers. For example, generative models may predict photorealistic yet physically infeasible frames that confuse low-level policies. Low-level policies may also be sensitive to subtle visual artifacts in generated goal images. This paper addresses these two facets of generalization, providing an interface to effectively "glue together" language-conditioned image or video prediction models with low-level goal-conditioned policies. Our method, Generative Hierarchical Imitation Learning-Glue (GHIL-Glue), filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts. We find in extensive experiments in both simulated and real environments that GHIL-Glue achieves a 25% improvement across several hierarchical models that leverage generative subgoals, achieving a new state-of-the-art on the CALVIN simulation benchmark for policies using observations from a single RGB camera. GHIL-Glue also outperforms other generalist robot policies across 3/4 language-conditioned manipulation tasks testing zero-shot generalization in physical experiments.

翻译：在互联网规模数据上预训练的图像与视频生成模型能够极大提升机器人学习系统的泛化能力。这些模型可作为高层规划器，为低层目标条件策略生成需达成的中间子目标。然而，此类系统的性能往往受限于生成模型与低层控制器之间的接口瓶颈。例如，生成模型可能预测出虽具照片级真实感但物理不可行的帧序列，从而干扰低层策略；低层策略也可能对生成目标图像中的细微视觉伪影极为敏感。本文针对这两个泛化难题，提出一种有效"粘合"语言条件图像/视频预测模型与低层目标条件策略的接口方法。我们提出的生成式层次模仿学习粘合框架（Generative Hierarchical Imitation Learning-Glue, GHIL-Glue）能够过滤无助于任务推进的子目标，并提升目标条件策略对含有有害视觉伪影的生成子目标的鲁棒性。通过在仿真与真实环境中的大量实验发现，GHIL-Glue在使用生成子目标的多种层次模型中实现了25%的性能提升，在仅使用单目RGB相机观测的CALVIN仿真基准测试中创造了新的最优记录。在物理实验中测试零样本泛化能力的四项语言条件操作任务中，GHIL-Glue亦在三项任务上超越了其他通用机器人策略。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日