ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning

Yiting Chen,Kenneth Kimble,Edward H. Adelson,Tamim Asfour,Podshara Chanrungmaneekul,Sachin Chitta,Yash Chitambar,Ziyang Chen,Ken Goldberg,Danica Kragic,Hui Li,Xiang Li,Yunzhu Li,Aaron Prather,Nancy Pollard,Maximo A. Roa-Garzon,Robert Seney,Shuo Sha,Shihefeng Wang,Yu Xiang,Kaifeng Zhang,Yuke Zhu,Kaiyu Hang

from arxiv, 32 pages, 8 figures

Dexterous manipulation enables robots to purposefully alter the physical world, transforming them from passive observers into active agents in unstructured environments. This capability is the cornerstone of physical artificial intelligence. Despite decades of advances in hardware, perception, control, and learning, progress toward general manipulation systems remains fragmented due to the absence of widely adopted standard benchmarks. The central challenge lies in reconciling the variability of the real world with the reproducibility and authenticity required for rigorous scientific evaluation. To address this, we introduce ManipulationNet, a global infrastructure that hosts real-world benchmark tasks for robotic manipulation. ManipulationNet delivers reproducible task setups through standardized hardware kits, and enables distributed performance evaluation via a unified software client that delivers real-time task instructions and collects benchmarking results. As a persistent and scalable infrastructure, ManipulationNet organizes benchmark tasks into two complementary tracks: 1) the Physical Skills Track, which evaluates low-level physical interaction skills, and 2) the Embodied Reasoning Track, which tests high-level reasoning and multimodal grounding abilities. This design fosters the systematic growth of an interconnected network of real-world abilities and skills, paving the path toward general robotic manipulation. By enabling comparable manipulation research in the real world at scale, this infrastructure establishes a sustainable foundation for measuring long-term scientific progress and identifying capabilities ready for real-world deployment.

翻译：灵巧操作使机器人能够有目的地改变物理世界，将其从被动观察者转变为非结构化环境中的主动智能体。这一能力是物理人工智能的基石。尽管硬件、感知、控制和学习领域已取得数十年进展，但由于缺乏广泛采用的标准基准，通用操作系统的发展仍呈碎片化。核心挑战在于如何调和真实世界的多样性与严谨科学评估所要求的可复现性和真实性。为此，我们提出ManipulationNet——一个承载真实世界机器人操作基准任务的全球性基础设施。该设施通过标准化硬件套件提供可复现的任务配置，并借助统一软件客户端实现分布式性能评估，该客户端可实时下达任务指令并收集基准测试结果。作为一个持久且可扩展的基础设施，ManipulationNet将基准任务组织为两个互补的赛道：1）物理技能赛道，评估低层物理交互技能；2）具身推理赛道，测试高层推理与多模态接地的能力。该设计促进了真实世界能力与技能互联网络的系统性发展，为通向通用机器人操作铺平道路。通过支持大规模可比较的真实世界操作研究，该基础设施为衡量长期科学进展及识别具备实际部署能力的技术建立了可持续的基准。