We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.
翻译:我们开源了MiMo-Embodied,这是首个成功整合并在自动驾驶与具身智能均达到最先进性能的跨具身基础模型。MiMo-Embodied在任务规划、功能预测和空间理解等17项具身智能基准测试中刷新纪录,同时在环境感知、状态预测和驾驶规划等12项自动驾驶基准测试中表现卓越。在这些任务中,MiMo-Embodied显著优于现有开源、闭源及专用基线模型。实验结果表明,通过多阶段学习、精心构建的数据集以及CoT(思维链)/RL(强化学习)微调,这两个领域展现出强烈的正向迁移效应,并相互促进。我们提供了模型设计与训练方法的详细分析,以推动进一步研究。代码及模型已在https://github.com/XiaomiMiMo/MiMo-Embodied开源。