The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such an universal agent would require a structured framework capable of wide generalization but trained within a reasonable data budget. In this paper, we develop an efficient system (RoboAgent) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enable our agent to exhibit a diverse repertoire of skills in novel situations specified using language commands. Using merely 7500 demonstrations, we are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, RoboAgent outperforms prior methods by over 40% in unseen situations while being more sample efficient and being amenable to capability improvements and extensions through fine-tuning. Videos at https://robopen.github.io/
翻译:通用机器人能够在多样环境中操控任意物体的宏伟目标与机器人数据集的匮乏之间存在矛盾。由于人工投入、运营成本和安全性挑战,获取并扩展此类数据集极为困难。迈向通用智能体的路径需要一种结构化框架,该框架能在合理的数据预算内实现广泛泛化。本文提出并开发了一个高效系统(RoboAgent),用于训练具备多任务操作技能的通用智能体,其核心技术包括:(a)语义增强方法,可快速扩充现有数据集;(b)动作表征方法,能从小型但多样化的多模态数据中提取高性能策略,同时避免过拟合。此外,可靠的任务条件化机制与富有表现力的策略架构使我们的智能体能够在语言指令指定的新场景中展示多样化的技能组合。仅使用7500条示范数据,我们便训练出一个掌握12种独特技能的单一智能体,并在涵盖日常活动的38个跨厨房场景任务中验证了其泛化能力。平均而言,RoboAgent在未知场景中的性能优于此前方法40%以上,同时具有更高的样本效率,并可通过微调实现能力提升与扩展。视频见https://robopen.github.io/