Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.
翻译:强化学习在多自由度机器人运动规划中仍面临训练速度慢和泛化能力差的效率问题。本文提出一种基于强化学习的机器人运动规划新框架,通过隐式行为克隆(IBC)与动态运动基元(DMP)提升离策略强化学习智能体的训练速度与泛化能力。隐式行为克隆利用人类示教数据加速强化学习训练,动态运动基元作为启发式模型将运动规划转化为更简化的规划空间。为此,我们通过拾放实验构建了可用于同类研究的人类示教数据集。仿真对比研究表明,所提方法相比传统强化学习智能体具有更快的训练速度和更高的得分。真实机器人实验验证了该方法在简单装配任务中的适用性。本研究为利用运动基元与人类示教提升机器人应用中强化学习性能提供了新视角。