Being-H0.5：面向跨具身泛化的人类中心机器人学习规模化模型 (Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization)

We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution. Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts. Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles. We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.

翻译：我们提出了Being-H0.5，这是一个基础性的视觉-语言-动作模型，旨在实现跨多样化机器人平台的鲁棒跨具身泛化。针对现有VLA模型常受形态异构性和数据稀缺性困扰的问题，我们提出了一种以人类为中心的学习范式，将人类交互轨迹视为物理交互的通用“母语”。为此，我们推出了UniHand-2.0，这是迄今为止规模最大的具身预训练方案，包含跨越30种不同机器人具身的超过35,000小时多模态数据。我们的方法引入了一个统一动作空间，将异构的机器人控制映射到语义对齐的槽位中，使得低资源机器人能够从人类数据和高资源平台中引导技能。基于此人类中心基础，我们设计了一个统一的序列建模与多任务预训练范式，以桥接人类示范与机器人执行。在架构上，Being-H0.5采用了一种混合Transformer设计，其新颖的混合流框架能够将共享的运动基元与专门的具身特定专家解耦。最后，为了使跨具身策略在现实世界中保持稳定，我们引入了流形保持门控以增强在感知偏移下的鲁棒性，以及通用异步分块技术，以在具有不同延迟和控制特性的具身间通用化分块控制。实验结果表明，Being-H0.5在模拟基准测试中取得了最先进的成果，例如LIBERO和RoboCasa，同时在五个机器人平台上展现出强大的跨具身能力。