Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem: learning to reach any goal without human-specified rewards or labels. Despite the seeming appeal, little (if any) prior work has demonstrated how self-supervised RL methods can be practically deployed on robotic systems. By first studying a challenging simulated version of this task, we discover design decisions about architectures and hyperparameters that increase the success rate by $2 \times$. These findings lay the groundwork for our main result: we demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks, with tasks being specified by a single goal image provided after training.
翻译:主要依赖自监督学习的机器人系统有潜力减少学习控制策略所需的人工标注和工程工作。与早期机器人系统借鉴计算机视觉(CV)和自然语言处理(NLP)的自监督技术类似,我们的工作建立在先前研究基础之上,证明强化学习(RL)本身可以被视为一个自监督问题:无需人工指定的奖励或标签即可学会到达任意目标。尽管这种方法看似颇具吸引力,但鲜有(即便有)先前工作展示过如何将自监督强化学习方法实际部署在机器人系统上。通过首先研究该任务的具有挑战性的模拟版本,我们发现了关于架构和超参数的设计决策,这些决策使成功率提升了$2 \times$。这些发现为我们主要结果奠定了基础:我们证明基于对比学习的自监督强化学习算法能够解决真实的、基于图像的机器人操作任务,且任务由训练后提供的单一目标图像来指定。