This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots. The purpose of this document is to share the excitement of the authors with the community and highlight a promising research direction in robotics and AI. The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research: deploying robots, or embodied AI agents more broadly, in various non-factory real-world settings to perform diverse tasks. This document presents a specific idea for mining knowledge in the latest large-scale foundation models for robotics research. Instead of directly adapting these models or using them to guide low-level policy learning, it advocates for using them to generate diversified tasks and scenes at scale, thereby scaling up low-level skill learning and ultimately leading to a foundation model for robotics that empowers generalist robots. The authors are actively pursuing this direction, but in the meantime, they recognize that the ambitious goal of building generalist robots with large-scale policy training demands significant resources such as computing power and hardware, and research groups in academia alone may face severe resource constraints in implementing the entire vision. Therefore, the authors believe sharing their thoughts at this early stage could foster discussions, attract interest towards the proposed pathway and related topics from industry groups, and potentially spur significant technical advancements in the field.
翻译:本文作为一篇立场论文,阐述了作者对构建通用机器人潜在路径的设想。本文旨在与学界分享作者的研究热情,并强调机器人学与人工智能领域一个富有前景的研究方向。作者认为,所提出的范式是实现机器人研究长期目标——即在各类非工厂真实环境中部署机器人(或更广义的具身智能体)以执行多样化任务——的可行路径。本文提出了一种具体思路:挖掘最新大规模基础模型中的知识以服务于机器人研究。该思路并非直接适配这些模型或将其用于指导底层策略学习,而是倡导利用这些模型大规模生成多样化任务与场景,从而推动底层技能学习的规模化扩展,最终形成赋能通用机器人的机器人基础模型。作者正积极推进这一方向,但同时也认识到,通过大规模策略训练构建通用机器人的宏伟目标需要大量计算资源与硬件设施支持,仅靠学术界研究团队在实现完整愿景时可能面临严重的资源限制。因此,作者认为在此早期阶段分享思考,能够促进讨论,吸引工业界关注该路径及相关课题,并可能推动该领域取得重大技术突破。