We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage off-policy Meta-RL combined with a trajectory-centric smoothness term to learn a set of parameterized skills. Our agent can use these learned skills to construct a three-level hierarchical framework that models a Temporally-extended Parameterized Action Markov Decision Process. We empirically demonstrate that the proposed algorithms enable an agent to solve a set of difficult long-horizon (obstacle-course and robot manipulation) tasks.
翻译:我们提出了一种新颖的参数化技能学习算法,旨在学习可迁移的参数化技能,并将其整合到一个支持在长时域任务中高效学习的全新动作空间中。我们提出利用离策略元强化学习结合轨迹中心平滑项来学习一组参数化技能。智能体可使用这些学习到的技能构建一个三层分层框架,该框架对时间扩展的参数化动作马尔可夫决策过程进行建模。实验表明,所提算法使智能体能够解决一系列困难的长时域(障碍赛道和机器人操作)任务。