Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by offering a unified data format, comprehensive sensory modalities, and a combination of real-world and simulated data. ARIO aims to improve the training of embodied AI agents, increasing their robustness and adaptability across various tasks and environments. Building upon the proposed new standard, we present a large-scale unified ARIO dataset, comprising approximately 3 million episodes collected from 258 series and 321,064 tasks. The ARIO standard and dataset represent a significant step towards bridging the gaps of existing data resources. By providing a cohesive framework for data collection and representation, ARIO paves the way for the development of more powerful and versatile embodied AI agents, capable of navigating and interacting with the physical world in increasingly complex and diverse ways. The project is available on https://imaei.github.io/project_pages/ario/
翻译:具身人工智能正在改变AI系统与物理世界的交互方式,然而现有数据集难以支撑通用型智能体的开发。其局限性包括缺乏标准化格式、数据多样性不足以及数据体量不够。为解决这些问题,我们提出了ARIO(All Robots In One)新数据标准,通过提供统一数据格式、完备的感知模态以及真实世界与仿真数据的结合,对现有数据集进行增强。ARIO旨在改进具身AI智能体的训练,提升其在各类任务与环境中的鲁棒性和适应性。基于这一新标准,我们构建了大规模统一ARIO数据集,包含从258个系列和321,064项任务中采集的约300万条交互轨迹。ARIO标准与数据集标志着在弥合现有数据资源缺口方面迈出了重要一步。通过提供数据采集与表征的连贯框架,ARIO为开发更强大、更通用的具身AI智能体铺平了道路,使其能够以日益复杂多样的方式在物理世界中导航与交互。本项目可通过https://imaei.github.io/project_pages/ario/ 访问。