In reinforcement learning (RL), sparse rewards can present a significant challenge. Fortunately, expert actions can be utilized to overcome this issue. However, acquiring explicit expert actions can be costly, and expert observations are often more readily available. This paper presents a new approach that uses expert observations for learning in robot manipulation tasks with sparse rewards from pixel observations. In particular, our technique involves using expert observations as intermediate visual goals for a goal-conditioned RL agent, enabling it to complete a task by successively reaching a series of goals. We demonstrate the efficacy of our method in five challenging block construction tasks in simulation and show that when combined with two state-of-the-art agents, our approach can significantly improve their performance while requiring 4-20 times fewer expert actions during training. Moreover, our method is also superior to a hierarchical baseline.
翻译:在强化学习(RL)中,稀疏奖励可能带来显著挑战。幸运的是,可以利用专家动作来克服这一问题。然而,获取显式专家动作的成本高昂,而专家观测往往更容易获得。本文提出了一种新方法,利用专家观测进行基于像素观测的稀疏奖励机器人操作任务学习。具体而言,我们的技术涉及将专家观测作为目标条件RL智能体的中间视觉目标,使其通过依次到达一系列目标来完成一项任务。我们在五个具有挑战性的积木构建仿真任务中展示了该方法的有效性,并表明,当与两种最先进的智能体结合时,我们的方法能显著提升它们的性能,同时在训练期间所需的专家动作减少4-20倍。此外,我们的方法也优于层次化基线方法。