基础模型迁移用于可泛化的机器人操作 (Transferring Foundation Models for Generalizable Robotic Manipulation)

Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-domain scenarios with new objects and diverse environments. In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks. By integrating the mask modality, which incorporates semantic, geometric, and temporal correlation priors derived from vision foundation models, into the end-to-end policy model, our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning, including new object instances, semantic categories, and unseen backgrounds. We first introduce a series of foundation models to ground natural language demands across multiple tasks. Secondly, we develop a two-stream 2D policy model based on imitation learning, which processes raw images and object masks to predict robot actions with a local-global perception manner. Extensive realworld experiments conducted on a Franka Emika robot arm demonstrate the effectiveness of our proposed paradigm and policy architecture. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.

翻译：提升通用机器人操作代理在现实世界中的泛化能力长期以来一直是一个重大挑战。现有方法通常依赖于收集大规模机器人数据，例如RT-1数据集，但这成本高昂且耗时。然而，由于数据多样性不足，这些方法通常在面对新物体和多样化环境的开放域场景时能力受限。本文提出一种新颖范式，有效利用互联网规模基础模型生成的语言推理分割掩码，以条件化机器人操作任务。通过将掩码模态——该模态融合了源自视觉基础模型的语义、几何和时间关联先验——集成到端到端策略模型中，我们的方法能够有效且鲁棒地感知物体姿态，并实现样本高效的泛化学习，包括对新物体实例、语义类别和未见背景的适应。我们首先引入一系列基础模型，以在多任务中锚定自然语言指令。其次，我们开发了一种基于模仿学习的双流二维策略模型，该模型以局部-全局感知方式处理原始图像和物体掩码，以预测机器人动作。在Franka Emika机械臂上进行的大量真实世界实验证明了我们提出的范式和策略架构的有效性。演示视频可在我们提交的视频中查看，更全面的演示可访问链接1或链接2。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日