This paper presents a novel approach to generalizing robot manipulation skills by combining a sampling-based task-and-motion planner with an offline reinforcement learning algorithm. Starting with a small library of scripted primitive skills (e.g. Push) and object-centric symbolic predicates (e.g. On(block, plate)), the planner autonomously generates a demonstration dataset of manipulation skills in the context of a long-horizon task. An offline reinforcement learning algorithm then extracts a policy from the dataset without further interactions with the environment and replaces the scripted skill in the existing library. Refining the skill library improves the robustness of the planner, which in turn facilitates data collection for more complex manipulation skills. We validate our approach in simulation, on a block-pushing task. We show that the proposed method requires less training data than conventional reinforcement learning methods. Furthermore, interaction with the environment is collision-free because of the use of planner demonstrations, making the approach more amenable to persistent robot learning in the real world.
翻译:本文提出了一种新颖的机器人操作技能泛化方法,通过将基于采样的任务与运动规划器与离线强化学习算法相结合。从一个包含脚本化原始技能(如推动)和以物体为中心的符号谓词(如On(block, plate))的小型技能库出发,规划器能够自主生成一个在长时域任务情境下的操作技能演示数据集。随后,离线强化学习算法从该数据集中提取策略,无需与环境进一步交互,并替换现有技能库中的脚本化技能。技能库的优化增强了规划器的鲁棒性,进而促进更复杂操作技能的数据采集。我们在一个积木推动任务的仿真环境中验证了该方法。实验表明,与传统的强化学习方法相比,所提方法所需的训练数据更少。此外,由于使用了规划器生成的演示,与环境交互过程无碰撞,使该方法更适用于现实世界中持续性的机器人学习。