小米机器人-0：一款支持实时执行的开源视觉-语言-动作模型 (Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution)

Rui Cai,Jun Guo,Xinze He,Piaopiao Jin,Jie Li,Bingxuan Lin,Futeng Liu,Wei Liu,Fei Ma,Kun Ma,Feng Qiu,Heng Qu,Yifei Su,Qiao Sun,Dong Wang,Donghao Wang,Yunhong Wang,Rujie Wu,Diyun Xiang,Yu Yang,Hangjun Ye,Yuan Zhang,Quanyun Zhou

from arxiv, Project page: https://xiaomi-robotics-0.github.io

In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a carefully designed training recipe and deployment strategy. Xiaomi-Robotics-0 is first pre-trained on large-scale cross-embodiment robot trajectories and vision-language data, endowing it with broad and generalizable action-generation capabilities while avoiding catastrophic forgetting of the visual-semantic knowledge of the underlying pre-trained VLM. During post-training, we propose several techniques for training the VLA model for asynchronous execution to address the inference latency during real-robot rollouts. During deployment, we carefully align the timesteps of consecutive predicted action chunks to ensure continuous and seamless real-time rollouts. We evaluate Xiaomi-Robotics-0 extensively in simulation benchmarks and on two challenging real-robot tasks that require precise and dexterous bimanual manipulation. Results show that our method achieves state-of-the-art performance across all simulation benchmarks. Moreover, Xiaomi-Robotics-0 can roll out fast and smoothly on real robots using a consumer-grade GPU, achieving high success rates and throughput on both real-robot tasks. To facilitate future research, code and model checkpoints are open-sourced at https://xiaomi-robotics-0.github.io

翻译：本报告介绍了小米机器人-0，这是一款专为高性能、快速流畅实时执行而优化的先进视觉-语言-动作模型。我们方法的核心在于精心设计的训练方案与部署策略。小米机器人-0首先在大规模跨具身机器人轨迹数据及视觉-语言数据上进行预训练，使其获得广泛且可泛化的动作生成能力，同时避免对底层预训练视觉语言模型视觉语义知识的灾难性遗忘。在后训练阶段，我们提出了多项技术，用于训练支持异步执行的VLA模型，以应对真实机器人部署过程中的推理延迟问题。在部署阶段，我们精心对齐连续预测动作块的时间步，以确保连续、无缝的实时执行。我们在仿真基准测试以及两项需要精确灵巧双手操作的具挑战性真实机器人任务上对小米机器人-0进行了全面评估。结果表明，我们的方法在所有仿真基准测试中均达到了最先进的性能水平。此外，小米机器人-0能够利用消费级GPU在真实机器人上实现快速流畅的执行，在两项真实机器人任务上均取得了高成功率与高吞吐量。为促进未来研究，代码与模型检查点已在 https://xiaomi-robotics-0.github.io 开源。

相关内容

小米

关注 7

小米公司正式成立于 2010 年 4 月，是一家以智能手机、智能硬件和 IoT 平台为核心的消费电子及智能制造公司。创业仅7年时间，小米的年收入就突破了千亿元人民币。截止 2018 年，小米的业务遍及全球 80 多个国家和地区。小米的使命是，始终坚持做“感动人心、价格厚道”的好产品，让全球每个人都能享受科技带来的美好生活。

运用小型语言模型解锁战术边缘人工智能优势

专知会员服务

28+阅读 · 2025年9月7日

面向机器人操作的基于大型视觉‑语言模型（VLM）的视觉‑语言‑动作（VLA）模型综述

专知会员服务

34+阅读 · 2025年8月19日

人形机器人深度：产业化渐行渐近，未来前景广阔

专知会员服务

39+阅读 · 2024年7月17日

人形机器人行业深度报告：人形机器人从0到1，国产化&软件赋能带来行业变革

专知会员服务

34+阅读 · 2024年4月11日