We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments. Website at clvrai.com/boss.
翻译:我们提出BOSS方法,通过最小化监督来扩展已习得的技能库,从而自动学会解决新的长时域、复杂且有意义的任务。先前强化学习领域的工作需要专家监督(如演示或丰富奖励函数)来学习长时域任务。而我们的BOSS方法(自举式技能习得)通过“技能自举”——让具备基础技能集的智能体在与环境交互过程中练习新技能,且无需为初始技能集之外的任务提供奖励反馈——来习得新任务。这一自举阶段由大语言模型(LLMs)引导,向智能体提示可串联的有意义技能。通过该过程,BOSS从基础技能集构建出广泛且复杂的实用行为。我们在现实家庭环境中的实验表明,采用LLM引导自举训练的智能体,在新环境中零样本执行未知长时域任务时,其表现优于使用朴素自举方法及先前无监督技能获取方法训练的智能体。详情请访问网站clvrai.com/boss。