A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many entities in the environment, making downstream skill chaining extremely challenging. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks. Code and skills visualization at jiahenghu.github.io/DUSDi-site/.
翻译:智能体的一个重要标志是能够仅通过与环境的无监督交互来学习可复用的技能。然而,现有的无监督技能发现方法通常学习到纠缠的技能,其中一个技能变量同时影响环境中的多个实体,这使得下游的技能链式组合极具挑战性。我们提出了解耦无监督技能发现(DUSDi),一种学习解耦技能的方法,这些技能可以被高效地复用以解决下游任务。DUSDi将技能分解为解耦的组件,其中每个技能组件仅影响状态空间的一个因子。重要的是,这些技能组件可以并发组合以生成底层动作,并通过分层强化学习高效地进行链式组合以应对下游任务。DUSDi定义了一个新颖的基于互信息的目标,以强制不同技能组件影响之间的解耦,并利用价值分解来高效优化该目标。在一系列具有挑战性的环境中进行评估,DUSDi成功地学习了解耦的技能,并且在将所学技能应用于解决下游任务时,其性能显著优于先前的技能发现方法。代码和技能可视化请访问 jiahenghu.github.io/DUSDi-site/。