ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

Ziyang Gong,Zehang Luo,Anke Tang,Zhe Liu,Shi Fu,Zhi Hou,Ganlin Yang,Weiyun Wang,Xiaofeng Wang,Jianbo Liu,Gen Luo,Haolan Kang,Shuang Luo,Yue Zhou,Yong Luo,Li Shen,Xiaosong Jia,Yao Mu,Xue Yang,Chunxiao Liu,Junchi Yan,Hengshuang Zhao,Dacheng Tao,Xiaogang Wang

from arxiv, Code: https://github.com/ACE-BRAIN-Team/ACE-Brain-0 Hugging Face: https://huggingface.co/ACE-Brain/ACE-Brain-0-8B

Universal embodied intelligence demands robust generalization across heterogeneous embodiments, such as autonomous driving, robotics, and unmanned aerial vehicles (UAVs). However, existing embodied brain in training a unified model over diverse embodiments frequently triggers long-tail data, gradient interference, and catastrophic forgetting, making it notoriously difficult to balance universal generalization with domain-specific proficiency. In this report, we introduce ACE-Brain-0, a generalist foundation brain that unifies spatial reasoning, autonomous driving, and embodied manipulation within a single multimodal large language model~(MLLM). Our key insight is that spatial intelligence serves as a universal scaffold across diverse physical embodiments: although vehicles, robots, and UAVs differ drastically in morphology, they share a common need for modeling 3D mental space, making spatial cognition a natural, domain-agnostic foundation for cross-embodiment transfer. Building on this insight, we propose the Scaffold-Specialize-Reconcile~(SSR) paradigm, which first establishes a shared spatial foundation, then cultivates domain-specialized experts, and finally harmonizes them through data-free model merging. Furthermore, we adopt Group Relative Policy Optimization~(GRPO) to strengthen the model's comprehensive capability. Extensive experiments demonstrate that ACE-Brain-0 achieves competitive and even state-of-the-art performance across 24 spatial and embodiment-related benchmarks.

翻译：通用具身智能需要在自动驾驶、机器人学和无人机等异构具身体系中实现稳健的泛化。然而，现有方法在多样化具身体系上训练统一模型时，常引发长尾数据、梯度干扰与灾难性遗忘问题，导致在通用泛化与领域专精能力间取得平衡极为困难。本报告介绍了ACE-Brain-0——一个将空间推理、自动驾驶与具身操作统一于单一多模态大语言模型（MLLM）的通用基础大脑。我们的核心洞见在于：空间智能可作为跨异质物理具身的通用支架。尽管车辆、机器人和无人机在形态上差异显著，但它们共同需要建模三维心智空间，这使得空间认知成为跨具身迁移的自然且领域无关的基础。基于此洞见，我们提出“支架-专精-调和”（SSR）范式：首先建立共享空间基础，继而培养领域专精专家，最终通过无数据模型融合实现协同。此外，我们采用分组相对策略优化（GRPO）以增强模型的综合能力。大量实验表明，ACE-Brain-0在24个空间与具身相关基准测试中取得了具有竞争力乃至最先进的性能。