Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills. Towards this goal, recent works have integrated semantic representations from large-scale pretrained vision-language (VL) models into manipulation models, imparting them with more general reasoning capabilities. However, we show that the conventional pretraining-finetuning pipeline for integrating such representations entangles the learning of domain-specific action information and domain-general visual information, leading to less data-efficient training and poor generalization to unseen objects and tasks. To this end, we propose ProgramPort, a modular approach to better leverage pretrained VL models by exploiting the syntactic and semantic structures of language instructions. Our framework uses a semantic parser to recover an executable program, composed of functional modules grounded on vision and action across different modalities. Each functional module is realized as a combination of deterministic computation and learnable neural networks. Program execution produces parameters to general manipulation primitives for a robotic end-effector. The entire modular network can be trained with end-to-end imitation learning objectives. Experiments show that our model successfully disentangles action and perception, translating to improved zero-shot and compositional generalization in a variety of manipulation behaviors. Project webpage at: \url{https://progport.github.io}.

翻译：机器人需要在现实世界中具备丰富的操控技能，以及语义推理何时应用这些技能的能力。为此，近期工作将大规模预训练视觉语言模型中的语义表示整合到操控模型中，赋予其更通用的推理能力。然而，我们发现，传统上用于整合此类表示的预训练-微调流程会将领域特定的动作信息与领域通用的视觉信息的学习纠缠在一起，导致训练数据效率低下，并对未见物体和任务的泛化能力较差。为此，我们提出了ProgramPort，一种模块化方法，通过利用语言指令的句法和语义结构来更好地利用预训练的视觉语言模型。我们的框架使用语义解析器来恢复一个可执行的程序，该程序由基于不同模态的视觉和动作基础的功能模块组成。每个功能模块通过确定性计算与可学习神经网络的组合实现。程序执行生成参数，用于机器人末端执行器的通用操控原语。整个模块化网络可以通过端到端的模仿学习目标进行训练。实验表明，我们的模型成功解耦了动作与感知，从而在多种操控行为中提升了零样本泛化与组合泛化能力。项目网页：\url{https://progport.github.io}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日