Embodied AI research is increasingly moving beyond single-task, single-environment policy learning toward multi-task, multi-scene, and multi-model settings. This shift substantially increases the engineering overhead and development time required for stages such as evaluation environment construction, trajectory collection, model training, and evaluation. To address this challenge, we propose a new paradigm for embodied AI development in which users express goals and constraints through conversation, and the system automatically plans and executes the development workflow. We instantiate this paradigm with EmbodiedClaw, a conversational agent that turns high-frequency, high-cost embodied research activities, including environment creation and revision, benchmark transformation, trajectory synthesis, model evaluation, and asset expansion, into executable skills. Experiments on end-to-end workflow tasks, capability-specific evaluations, human researcher studies, and ablations show that EmbodiedClaw reduces manual engineering effort while improving executability, consistency, and reproducibility. These results suggest a shift from manual toolchains to conversationally executable workflows for embodied AI development.
翻译:具身人工智能研究正日益从单一任务、单一环境的策略学习转向多任务、多场景、多模型的设定。这一转变显著增加了诸如评估环境构建、轨迹收集、模型训练和评估等阶段所需的工程开销和开发时间。为解决这一挑战,我们提出了一种新的具身人工智能开发范式:用户通过对话表达目标和约束,系统则自动规划并执行开发工作流。我们通过EmbodiedClaw实例化了这一范式,它是一个对话代理,能将高频率、高成本的具身研究活动(包括环境创建与修订、基准转换、轨迹合成、模型评估以及资产扩展)转化为可执行的技能。在端到端工作流任务、特定能力评估、人类研究员研究以及消融实验上的结果表明,EmbodiedClaw减少了人工工程投入,同时提升了可执行性、一致性和可复现性。这些结果预示着具身人工智能开发将从手动工具链向可对话执行的工作流转变。