High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough qualified data either as demonstrations for IL or experiences for RL due to safety considerations in environments with obstacles. We target this challenge by proposing the self-imitation learning by planning plus (SILP+) algorithm, which efficiently embeds experience-based planning into the learning architecture to mitigate the data-collection problem. The planner generates demonstrations based on successfully visited states from the current RL policy, and the policy improves by learning from these demonstrations. In this way, we relieve the demand for human expert operators to collect demonstrations required by IL and improve the RL performance as well. Various experimental results show that SILP+ achieves better training efficiency higher and more stable success rate in complex motion planning tasks compared to several other methods. Extensive tests on physical robots illustrate the effectiveness of SILP+ in a physical setting.
翻译:高质量且具有代表性的数据对于基于模仿学习和基于强化学习的运动规划任务至关重要。对于真实机器人而言,由于存在障碍物的环境中需考虑安全问题,收集足够数量的合格数据(无论是作为模仿学习的示教数据还是强化学习的经验数据)都极具挑战性。我们通过提出SILP+(基于规划与自我模仿学习的增强算法)来应对这一挑战,该算法将经验规划高效嵌入学习架构,从而缓解数据收集问题。该规划器基于当前强化学习策略中成功访问的状态生成示教数据,并通过学习这些示教数据来改进策略。通过这种方式,我们既减轻了模仿学习中对人类专家操作员收集示教数据的需求,同时也提升了强化学习的性能。多种实验结果表明,在复杂运动规划任务中,SILP+相较于其他方法实现了更高的训练效率与更稳定且更高的成功率。基于物理机器人的广泛测试进一步验证了SILP+在实体环境中的有效性。