Towards A Foundation Model for Generalist Robots: Diverse Skill Learning at Scale via Automated Task and Scene Generation

This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots. The purpose of this document is to share the excitement of the authors with the community and highlight a promising research direction in robotics and AI. The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research: deploying robots, or embodied AI agents more broadly, in various non-factory real-world settings to perform diverse tasks. This document presents a specific idea for mining knowledge in the latest large-scale foundation models for robotics research. Instead of directly adapting these models or using them to guide low-level policy learning, it advocates for using them to generate diversified tasks and scenes at scale, thereby scaling up low-level skill learning and ultimately leading to a foundation model for robotics that empowers generalist robots. The authors are actively pursuing this direction, but in the meantime, they recognize that the ambitious goal of building generalist robots with large-scale policy training demands significant resources such as computing power and hardware, and research groups in academia alone may face severe resource constraints in implementing the entire vision. Therefore, the authors believe sharing their thoughts at this early stage could foster discussions, attract interest towards the proposed pathway and related topics from industry groups, and potentially spur significant technical advancements in the field.

翻译：本文作为一篇立场论文，阐述了作者对构建通用机器人潜在路径的设想。本文旨在与学界分享作者的研究热情，并强调机器人学与人工智能领域一个富有前景的研究方向。作者认为，所提出的范式是实现机器人研究长期目标——即在各类非工厂真实环境中部署机器人（或更广义的具身智能体）以执行多样化任务——的可行路径。本文提出了一种具体思路：挖掘最新大规模基础模型中的知识以服务于机器人研究。该思路并非直接适配这些模型或将其用于指导底层策略学习，而是倡导利用这些模型大规模生成多样化任务与场景，从而推动底层技能学习的规模化扩展，最终形成赋能通用机器人的机器人基础模型。作者正积极推进这一方向，但同时也认识到，通过大规模策略训练构建通用机器人的宏伟目标需要大量计算资源与硬件设施支持，仅靠学术界研究团队在实现完整愿景时可能面临严重的资源限制。因此，作者认为在此早期阶段分享思考，能够促进讨论，吸引工业界关注该路径及相关课题，并可能推动该领域取得重大技术突破。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日