Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Yafei Hu,Quanting Xie,Vidhi Jain,Jonathan Francis,Jay Patrikar,Nikhil Keetha,Seungchan Kim,Yaqi Xie,Tianyi Zhang,Zhibo Zhao,Yu-Quan Chong,Chen Wang,Katia Sycara,Matthew Johnson-Roberson,Dhruv Batra,Xiaolong Wang,Sebastian Scherer,Zsolt Kira,Fei Xia,Yonatan Bisk

Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our ``living`` GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics.

翻译：构建能够在任何环境中与任意物体交互、运用多种技能完成多样化任务的通用机器人，是人工智能领域的长期目标。然而，现有大多数机器人系统受到诸多限制——它们专为特定任务设计，基于特定数据集训练，并部署在特定环境中。这些系统通常依赖大量标注数据、使用任务特定模型，在真实场景部署时存在大量泛化问题，且难以应对分布偏移。受自然语言处理（NLP）和计算机视觉（CV）研究领域中的网络规模、大容量预训练模型（即基础模型）在开放集表现和内容生成能力方面的显著成果启发，本综述致力于探讨：（i）NLP和CV中现有基础模型如何应用于机器人领域，以及（ii）针对机器人领域的基础模型应具有何种形态。我们首先概述传统机器人系统的构成要素及其实现通用化面临的根本障碍。随后建立分类体系，讨论当前利用现有基础模型服务机器人领域、或开发面向机器人应用的基础模型的研究工作。最后，我们探讨利用基础模型实现通用机器人系统的关键挑战与未来发展方向。我们鼓励读者关注我们持续更新的GitHub资源库，其中包含本综述涉及的相关论文、项目及用于开发机器人基础模型的开源资源。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日