Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.
翻译:针对多自由度机器人运动规划的强化学习(RL)仍存在训练速度慢、泛化能力差等效率低下的问题。本文提出一种新颖的基于强化学习的机器人运动规划框架,该框架利用隐式行为克隆(IBC)与动态运动基元(DMP)来提升离线策略强化学习智能体的训练速度与泛化能力。IBC利用人类演示数据以提升强化学习的训练速度,DMP则作为启发式模型,将运动规划转换至更简单的规划空间。为此,我们还通过拾放实验创建了一个可用于类似研究的人类演示数据集。仿真对比研究揭示了所提方法相较于传统强化学习智能体的优势,其训练速度更快且得分更高。真实机器人实验表明该方法适用于简单的装配任务。我们的工作为利用运动基元与人类演示以提升强化学习在机器人应用中的性能提供了新的视角。