All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

Zhiqiang Wang,Hao Zheng,Yunshuang Nie,Wenjun Xu,Qingwei Wang,Hua Ye,Zhe Li,Kaidong Zhang,Xuewen Cheng,Wanxi Dong,Chang Cai,Liang Lin,Feng Zheng,Xiaodan Liang

from arxiv, Project website: https://imaei.github.io/project_pages/ario/

Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by offering a unified data format, comprehensive sensory modalities, and a combination of real-world and simulated data. ARIO aims to improve the training of embodied AI agents, increasing their robustness and adaptability across various tasks and environments. Building upon the proposed new standard, we present a large-scale unified ARIO dataset, comprising approximately 3 million episodes collected from 258 series and 321,064 tasks. The ARIO standard and dataset represent a significant step towards bridging the gaps of existing data resources. By providing a cohesive framework for data collection and representation, ARIO paves the way for the development of more powerful and versatile embodied AI agents, capable of navigating and interacting with the physical world in increasingly complex and diverse ways. The project is available on https://imaei.github.io/project_pages/ario/

翻译：具身人工智能正在改变AI系统与物理世界的交互方式，然而现有数据集难以支撑通用型智能体的开发。其局限性包括缺乏标准化格式、数据多样性不足以及数据体量不够。为解决这些问题，我们提出了ARIO（All Robots In One）新数据标准，通过提供统一数据格式、完备的感知模态以及真实世界与仿真数据的结合，对现有数据集进行增强。ARIO旨在改进具身AI智能体的训练，提升其在各类任务与环境中的鲁棒性和适应性。基于这一新标准，我们构建了大规模统一ARIO数据集，包含从258个系列和321,064项任务中采集的约300万条交互轨迹。ARIO标准与数据集标志着在弥合现有数据资源缺口方面迈出了重要一步。通过提供数据采集与表征的连贯框架，ARIO为开发更强大、更通用的具身AI智能体铺平了道路，使其能够以日益复杂多样的方式在物理世界中导航与交互。本项目可通过https://imaei.github.io/project_pages/ario/ 访问。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日