We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like $π_{0.5}$ across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.
翻译:我们提出Embodied-R1.5,一种统一的具身基础模型(EFM),旨在单一架构中集成全面的具身推理能力,涵盖具身认知、任务规划、纠错与指向,以迈向通用物理智能。通过利用三条自动化数据构建流水线,显著扩展关键能力的数据覆盖范围,我们构建了一个超过150亿词元的大规模数据系统,并设计了一种多任务平衡的强化学习方案以缓解异构任务冲突。我们进一步引入规划-接地-纠错(PGC)闭环框架,使单一模型能够自主执行并自我纠错长时域任务。仅凭80亿参数,Embodied-R1.5在24个具身视觉-语言模型基准测试中的16项上达到最优性能,超越了Gemini-Robotics-ER-1.5和GPT-5.4等领先模型。由于内化了具身能力,Embodied-R1.5仅需少量数据即可微调为视觉-语言-动作模型,在4个流行操作基准测试套件上优于$π_{0.5}$等领先视觉-语言-动作模型。我们进一步开展了广泛的零样本真实机器人实验,在指令遵循、可供性接地、铰接物体操作和长时域复杂任务中验证了性能,展示了在物理世界中的强泛化能力。我们开源了模型权重、数据集、训练代码以及针对具身任务设计的评估框架EmbodiedEvalKit,以促进未来对具身基础模型的研究。