Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-source, widely applicable, generalist policies for robotic manipulation. As a first step, we introduce Octo, a large transformer-based policy trained on 800k trajectories from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. It can be instructed via language commands or goal images and can be effectively finetuned to robot setups with new sensory inputs and action spaces within a few hours on standard consumer GPUs. In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces. We also perform detailed ablations of design decisions for the Octo model, from architecture to training data, to guide future research on building generalist robot models.
翻译:基于多样化机器人数据集预训练的大规模策略有潜力改变机器人学习:此类通用机器人策略无需从头训练新策略,而可通过少量领域内数据微调实现广泛泛化。然而,要广泛应用于各类机器人学习场景、环境和任务,此类策略需要处理多样化的传感器和动作空间,适配多种常用机器人平台,并能快速高效地微调至新领域。本研究旨在为开发开源、广泛适用的机器人操作通用策略奠定基础。作为第一步,我们提出Octo——一种基于Transformer的大规模策略,其训练数据来自当前最大的机器人操作数据集Open X-Embodiment中的80万条轨迹。该策略可通过语言指令或目标图像进行引导,并能在标准消费级GPU上数小时内有效微调至具有新型传感器输入和动作空间的机器人环境。在横跨9个机器人平台的实验中,我们证明Octo可作为通用策略初始化框架,有效适配新的观测与动作空间。此外,我们对Octo模型从架构到训练数据的设计决策进行了详细的消融分析,以指导未来通用机器人模型的研究方向。