In reinforcement learning (RL), sparse rewards can present a significant challenge. Fortunately, expert actions can be utilized to overcome this issue. However, acquiring explicit expert actions can be costly, and expert observations are often more readily available. This paper presents a new approach that uses expert observations for learning in robot manipulation tasks with sparse rewards from pixel observations. Specifically, our technique involves using expert observations as intermediate visual goals for a goal-conditioned RL agent, enabling it to complete a task by successively reaching a series of goals. We demonstrate the efficacy of our method in five challenging block construction tasks in simulation and show that when combined with two state-of-the-art agents, our approach can significantly improve their performance while requiring 4-20 times fewer expert actions during training. Moreover, our method is also superior to a hierarchical baseline.
翻译:在强化学习(RL)中,稀疏奖励可能构成重大挑战。幸运的是,可以利用专家动作来克服这一问题。然而,获取显式专家动作的成本较高,而专家观测往往更容易获得。本文提出了一种新方法,该方法利用专家观测从像素观测中学习,以应对机器人操作任务中的稀疏奖励问题。具体而言,我们的技术将专家观测作为目标条件RL智能体的中间视觉目标,使其能够通过依次到达一系列目标来完成一项任务。我们在五个具有挑战性的模拟积木搭建任务中证明了该方法的有效性,并表明当与两种最先进的智能体结合时,我们的方法能显著提升其性能,同时在训练期间所需的专家动作量减少了4至20倍。此外,我们的方法也优于层次化基线方法。