There has been significant recent progress in the area of unsupervised skill discovery, with various works proposing mutual information based objectives, as a source of intrinsic motivation. Prior works predominantly focused on designing algorithms that require online access to the environment. In contrast, we develop an \textit{offline} skill discovery algorithm. Our problem formulation considers the maximization of a mutual information objective constrained by a KL-divergence. More precisely, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning and unsupervised skill discovery, and to give a simple offline algorithm for learning diverse skills that are aligned with an expert.
翻译:近年来,无监督技能发现领域取得了显著进展,多项研究提出基于互信息的目标函数作为内在动机的来源。此前的研究主要聚焦于需要在线环境访问的算法设计。相比之下,我们提出了一种离线技能发现算法。我们的问题表述考虑了受 KL 散度约束的互信息目标最大化。更具体地说,这些约束确保每个技能的状态占用率与专家的状态占用率保持接近,且处于具有良好状态-动作覆盖的离线数据集支持范围内。我们的主要贡献在于将 Fenchel 对偶、强化学习与无监督技能发现联系起来,并提出一种简单的离线算法,用于学习与专家对齐的多样化技能。