Unsupervised skill learning aims to learn a rich repertoire of behaviors without external supervision, providing artificial agents with the ability to control and influence the environment. However, without appropriate knowledge and exploration, skills may provide control only over a restricted area of the environment, limiting their applicability. Furthermore, it is unclear how to leverage the learned skill behaviors for adapting to downstream tasks in a data-efficient manner. We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. During adaptation, the agent uses a meta-controller to evaluate and adapt the learned skills efficiently by deploying them in parallel in imagination. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy. The skills can be used to effectively adapt to downstream tasks, as we show in the URL benchmark, where we outperform previous approaches from both pixels and states inputs. The learned skills also explore the environment thoroughly, finding sparse rewards more frequently, as shown in goal-reaching tasks from the DMC Suite and Meta-World. Website and code: https://skillchoreographer.github.io/
翻译:无监督技能学习旨在无需外部监督的情况下学习丰富的行为库,使智能体能够控制并影响环境。然而,若缺乏适当的知识与探索,习得的技能可能仅能控制环境的有限区域,从而限制其适用性。此外,如何以数据高效的方式利用习得的技能行为来适应下游任务仍不明确。我们提出编舞者(Choreographer),这是一种基于模型的智能体,其利用世界模型在想象中学习并适应技能。该方法将探索与技能学习过程分离,能够在模型的潜在状态空间中自主发现技能。在适应阶段,智能体通过元控制器在想象中并行部署习得的技能,以高效评估与调整这些技能。编舞者既可基于离线数据学习技能,也可通过探索策略同步收集数据来学习技能。这些技能可用于高效适应下游任务:在URL基准测试中,我们超越先前基于像素输入和状态输入的方法,展现了这一优势。在DMC Suite与Meta-World的目标达成任务中,习得的技能还能全面探索环境,更频繁地发现稀疏奖励。网站与代码:https://skillchoreographer.github.io/