Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficient exploration, and by the complexity of credit assignment over long horizons. In this work, we present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL to achieve long-horizon complex manipulation tasks. We leverage task-agnostic play data to learn a discrete behavioral prior over object-centric primitives, modeling their feasibility given the current context. We then design a high-level goal-conditioned policy which (1) uses primitives as building blocks to scaffold complex long-horizon tasks and (2) leverages the behavioral prior to accelerate learning. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks and learns policies that can be easily transferred to physical hardware.
翻译:现实世界的机器人操纵任务仍面临严峻挑战,因为这类任务既需要精细的环境交互能力,又需要实现长时域目标的规划能力。虽然深度强化学习方法在高维环境中的端到端规划已展现出令人鼓舞的成果,但其根本局限性在于:因探索效率低下导致的样本效率不足,以及长时域信用分配的复杂性。本文提出"从游戏中高效学习高级规划框架"(ELF-P),该机器人学习框架将运动规划与深度强化学习相结合,以实现复杂长时域操纵任务。我们利用任务无关的交互数据,学习面向物体中心基元的离散行为先验,从而在给定当前上下文时建模基元的可行性。随后设计基于高级目标的条件化策略:(1) 以基元为构建模块支撑复杂长时域任务;(2) 利用行为先验加速学习过程。我们在多个逼真操纵任务中验证了ELF-P相比相关基准方法具有显著更优的样本效率,且学习到的策略可便捷迁移至实体硬件。