We introduce a novel deep reinforcement learning (RL) approach called Movement Primitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning.
翻译:我们提出了一种名为“基于运动基元的规划策略”(MP3)的新型深度强化学习(RL)方法。通过将运动基元(MPs)整合到深度强化学习框架中,MP3能够在整个学习过程中生成平滑轨迹,同时有效学习稀疏和非马尔可夫奖励。此外,MP3在执行过程中保持适应环境变化的能力。尽管早期机器人强化学习的许多成功案例是通过将强化学习与运动基元结合实现的,但这些方法通常局限于学习单次击球式运动,缺乏适应任务变化或执行过程中调整运动的能力。基于我们先前的工作——该工作引入了一种基于回合的强化学习方法,用于非线性调整任务变化下的MP参数——本文将该方法扩展至纳入重规划策略。这使得运动执行过程中能够调整MP参数,解决了在需要反馈的随机域中缺乏在线运动适应性的问题。我们将我们的方法与最先进的深度强化学习及含运动基元的强化学习方法进行了比较。结果表明,在复杂的稀疏奖励设置以及需要重规划的领域中,该方法展现了更优的性能。