This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots. The purpose of this document is to share the excitement of the authors with the community and highlight a promising research direction in robotics and AI. The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research: deploying robots, or embodied AI agents more broadly, in various non-factory real-world settings to perform diverse tasks. This document presents a specific idea for mining knowledge in the latest large-scale foundation models for robotics research. Instead of directly using or adapting these models to produce low-level policies and actions, it advocates for a fully automated generative pipeline (termed as generative simulation), which uses these models to generate diversified tasks, scenes and training supervisions at scale, thereby scaling up low-level skill learning and ultimately leading to a foundation model for robotics that empowers generalist robots. The authors are actively pursuing this direction, but in the meantime, they recognize that the ambitious goal of building generalist robots with large-scale policy training demands significant resources such as computing power and hardware, and research groups in academia alone may face severe resource constraints in implementing the entire vision. Therefore, the authors believe sharing their thoughts at this early stage could foster discussions, attract interest towards the proposed pathway and related topics from industry groups, and potentially spur significant technical advancements in the field.
翻译:本文是一篇立场论文,阐述了作者对实现通用机器人潜在路径的构想。本文旨在与学界分享作者的研究热情,并凸显机器人学与人工智能领域一个极具前景的研究方向。作者认为,所提出的范式是实现机器人研究长期目标——即在多种非工厂真实场景中部署机器人(更广义的具身智能体)以执行多样化任务——的可行路径。本文提出了一种从最新大规模基础模型中挖掘知识用于机器人研究的具体思路。该思路并非直接使用或适配这些模型来生成底层策略与动作,而是倡导一种全自动的生成式流程(称为生成式仿真),利用这些模型大规模生成多样化任务、场景及训练监督信号,从而拓展底层技能学习,最终打造赋能通用机器人的机器人基础模型。作者正积极沿此方向推进,但同时认识到,通过大规模策略训练构建通用机器人的宏伟目标需要大量计算资源与硬件设施,单纯依靠学术研究团队可能面临严重的资源瓶颈。因此,作者认为在现阶段分享这一思考,有助于激发学界讨论,吸引产业界关注所提出的路径及相关课题,并有望推动该领域取得重大技术突破。