One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision. However, the current unsupervised skill discovery methods are often limited to acquiring simple, easy-to-learn skills due to the lack of incentives to discover more complex, challenging behaviors. We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision. The key component of CSD is a controllability-aware distance function, which assigns larger values to state transitions that are harder to achieve with the current skills. Combined with distance-maximizing skill discovery, CSD progressively learns more challenging skills over the course of training as our jointly trained distance function reduces rewards for easy-to-achieve skills. Our experimental results in six robotic manipulation and locomotion environments demonstrate that CSD can discover diverse complex skills including object manipulation and locomotion skills with no supervision, significantly outperforming prior unsupervised skill discovery methods. Videos and code are available at https://sites.google.com/view/icml2023csd
翻译:智能体的关键能力之一是在无外部监督的情况下发现有用技能。然而,由于缺乏发现更复杂、更具挑战性行为的激励机制,当前的无监督技能发现方法往往局限于获取简单、易于学习的技能。我们提出了一种新颖的无监督技能发现方法——基于可控性的技能发现(Controllability-aware Skill Discovery, CSD),该方法能在无监督条件下主动寻找复杂、难以控制的技能。CSD的核心是一个基于可控性的距离函数,该函数为当前技能难以实现的状态转移赋予较大数值。结合距离最大化的技能发现策略,CSD在训练过程中逐步学习更具挑战性的技能——我们联合训练的距离函数会降低易于达成技能的奖励值。在六个机器人操作与运动环境中的实验结果表明,CSD能够在无监督条件下发现包括物体操作技能和运动技能在内的多样化复杂技能,显著优于先前的无监督技能发现方法。视频和代码请访问 https://sites.google.com/view/icml2023csd