Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.
翻译:通过丰富的技能集对机器人策略进行预训练,可显著加速下游任务的学习。现有工作利用自然语言指令定义预训练任务,但这种方式需要对数十万条指令进行繁琐的人工标注。为此,我们提出SPRINT——一种可扩展的离线策略预训练方法,能大幅减少预训练多样化技能集所需的人力成本。该方法基于两大核心思想自动扩展基础预训练任务集:通过大语言模型实现指令重标定,以及借助离线强化学习实现跨轨迹技能链式组合。实验证明,SPRINT预训练可赋予机器人更丰富的技能储备。在家庭模拟器及真实机器人厨房操作任务中的结果表明,与现有预训练方法相比,SPRINT能显著加速新长时域任务的学习效率。项目网站:https://clvrai.com/sprint。