Embodied agents capable of complex physical skills can improve productivity, elevate life quality, and reshape human-machine collaboration. We aim at autonomous training of embodied agents for various tasks involving mainly large foundation models. It is believed that these models could act as a brain for embodied agents; however, existing methods heavily rely on humans for task proposal and scene customization, limiting the learning autonomy, training efficiency, and generalization of the learned policies. In contrast, we introduce a brain-body synchronization ({\it BBSEA}) scheme to promote embodied learning in unknown environments without human involvement. The proposed combines the wisdom of foundation models (``brain'') with the physical capabilities of embodied agents (``body''). Specifically, it leverages the ``brain'' to propose learnable physical tasks and success metrics, enabling the ``body'' to automatically acquire various skills by continuously interacting with the scene. We carry out an exploration of the proposed autonomous learning scheme in a table-top setting, and we demonstrate that the proposed synchronization can generate diverse tasks and develop multi-task policies with promising adaptability to new tasks and configurations. We will release our data, code, and trained models to facilitate future studies in building autonomously learning agents with large foundation models in more complex scenarios. More visualizations are available at \href{https://bbsea-embodied-ai.github.io}{https://bbsea-embodied-ai.github.io}
翻译:具备复杂物理技能的具身智能体能够提升生产效率、改善生活质量并重塑人机协作。本研究旨在实现具身智能体在多种任务中的自主训练,主要依赖大型基础模型。尽管这些模型被视为具身智能体的"大脑",但现有方法严重依赖人类进行任务设计与场景定制,导致学习自主性受限、训练效率低下且学得策略泛化能力不足。为此,我们提出脑体同步(BBSEA)方案,使具身智能体无需人类干预即可在未知环境中自主学习。该方案融合基础模型("大脑")的智能与具身智能体("身体")的物理能力:具体而言,利用"大脑"提出可学习的物理任务与成功度量指标,使"身体"通过持续场景交互自动获取多样化技能。我们在桌面场景中开展了该自主学习方案的探索性研究,实验表明所提出的同步机制能够生成多样化任务并开发多任务策略,对新任务与配置展现出良好的适应性。我们将公开数据集、代码与训练模型,以推动在更复杂场景中构建基于大模型自主学习的智能体研究。更多可视化结果请访问\href{https://bbsea-embodied-ai.github.io}{https://bbsea-embodied-ai.github.io}。