Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance. We benchmark our approach on a set of challenging sparse-reward environments with a complex, simulated humanoid, and on offline RL benchmarks for navigation and object manipulation. Videos, source code and pre-trained models are available at the corresponding project website at https://facebookresearch.github.io/latent-space-priors .
翻译:示范揭示了相关状态或动作空间的区域,具有显著提升强化学习代理效率与实用性的潜力。本研究提出通过结合技能学习与序列建模来利用示范数据集。从学习到的联合潜在空间出发,我们分别训练了示范序列的生成模型及其配套的低层策略。该序列模型在合理示范行为上形成潜在空间先验,以加速高层策略的学习。我们展示了如何仅从状态动作捕捉示范中获取此类先验,并探索了在迁移任务中将其整合到策略学习中的多种方法。实验结果证实,潜在空间先验在学习速度与最终性能方面均带来了显著提升。我们在一组具有挑战性的稀疏奖励环境中,使用复杂模拟人形机器人,以及在导航与物体操作的离线RL基准上对方法进行了评估。视频、源代码及预训练模型见项目网站:https://facebookresearch.github.io/latent-space-priors。