Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-source, widely applicable, generalist policies for robotic manipulation. As a first step, we introduce Octo, a large transformer-based policy trained on 800k trajectories from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. It can be instructed via language commands or goal images and can be effectively finetuned to robot setups with new sensory inputs and action spaces within a few hours on standard consumer GPUs. In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces. We also perform detailed ablations of design decisions for the Octo model, from architecture to training data, to guide future research on building generalist robot models.
翻译:在多样化机器人数据集上预训练的大型策略具有变革机器人学习的潜力:此类通用机器人策略仅需少量领域内数据微调即可广泛泛化,无需从头训练新策略。然而,为广泛适用于各类机器人学习场景、环境与任务,此类策略需处理多样化的传感器与动作空间,适配多种常用机器人平台,并能便捷高效地微调至新领域。本工作旨在为开发开源、广泛适用、通用的机器人操作策略奠定基础。作为第一步,我们推出Octo——一个基于Transformer的大型策略模型,其训练数据源自Open X-Embodiment数据集的80万条轨迹,该数据集是迄今规模最大的机器人操作数据集。该策略可通过语言指令或目标图像进行控制,并能在标准消费级GPU上数小时内有效微调至具有新型传感输入与动作空间的机器人配置。在涵盖9个机器人平台的实验中,我们证明Octo可作为多功能策略初始化方案,能有效微调至新的观测与动作空间。我们还对Octo模型从架构到训练数据的设计决策进行了详细消融实验,以指导未来构建通用机器人模型的研究方向。