We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments. Website at clvrai.com/boss.
翻译:我们提出BOSS方法,该方法通过以最小监督方式自动扩展已学习的技能库,自主学会解决新的长周期、复杂且有意义的任务。先前强化学习研究需借助专家演示或密集奖励函数等监督手段来学习长周期任务。而我们的BOSS(自举技能)方法通过执行"技能自举"来学习完成新任务:即拥有基本技能集的智能体与环境交互并练习新技能,过程中无需为初始技能集之外的任务提供奖励反馈。该自举阶段由大语言模型(LLMs)指导,告知智能体哪些有意义的技能可以串联组合。通过此过程,BOSS从基础技能集构建出广泛而复杂有用的行为模式。我们在真实家居环境的实验表明,经过LLM引导自举流程训练的智能体,在零样本执行未见过的长周期任务时,其表现优于采用朴素自举训练及现有无监督技能获取方法的智能体。项目网站:clvrai.com/boss。