GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

GigaBrain Team,Boyuan Wang,Chaojun Ni,Guan Huang,Guosheng Zhao,Hao Li,Jie Li,Jindi Lv,Jingyu Liu,Lv Feng,Mingming Yu,Peng Li,Qiuping Deng,Tianze Liu,Xinyu Zhou,Xinze Chen,Xiaofeng Wang,Yang Wang,Yifan Li,Yifei Nie,Yilong Li,Yukun Zhou,Yun Ye,Zhichao Liu,Zheng Zhu

from arxiv, https://gigabrain05m.github.io/

Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enhancing VLA learning. Therefore, we propose \textit{GigaBrain-0.5M*}, a VLA model trained via world model-based reinforcement learning. Built upon \textit{GigaBrain-0.5}, which is pre-trained on over 10,000 hours of robotic manipulation data, whose intermediate version currently ranks first on the international RoboChallenge benchmark. \textit{GigaBrain-0.5M*} further integrates world model-based reinforcement learning via \textit{RAMP} (Reinforcement leArning via world Model-conditioned Policy) to enable robust cross-task adaptation. Empirical results demonstrate that \textit{RAMP} achieves substantial performance gains over the RECAP baseline, yielding improvements of approximately 30\% on challenging tasks including \texttt{Laundry Folding}, \texttt{Box Packing}, and \texttt{Espresso Preparation}. Critically, \textit{GigaBrain-0.5M$^*$} exhibits reliable long-horizon execution, consistently accomplishing complex manipulation tasks without failure as validated by real-world deployment videos on our \href{https://gigabrain05m.github.io}{project page}.

翻译：直接从当前观测预测多步动作块的视觉-语言-动作模型，因其受限的场景理解能力和薄弱的前瞻能力而面临固有局限。相比之下，在网络规模视频语料上预训练的视频世界模型展现出强大的时空推理与准确的未来预测能力，这使其成为增强VLA学习的天然基础。因此，我们提出了\textit{GigaBrain-0.5M*}，一种通过基于世界模型的强化学习训练的VLA模型。该模型基于\textit{GigaBrain-0.5}构建，后者已在超过10,000小时的机器人操作数据上完成预训练，其当前中间版本在国际RoboChallenge基准测试中排名第一。\textit{GigaBrain-0.5M*}进一步通过\textit{RAMP}（基于世界模型条件策略的强化学习）整合了基于世界模型的强化学习，以实现鲁棒的跨任务适应能力。实证结果表明，\textit{RAMP}相较于RECAP基线取得了显著的性能提升，在包括\texttt{衣物折叠}、\texttt{装箱打包}和\texttt{意式浓缩咖啡制作}在内的挑战性任务上，性能提升约30\%。至关重要的是，\textit{GigaBrain-0.5M$^*$}展现出可靠的长时程执行能力，能够持续完成复杂的操作任务且无失败，这已通过我们\href{https://gigabrain05m.github.io}{项目页面}上的真实世界部署视频得到验证。