Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and their embedding in an iterative online/offline scheme (``collect and infer'') can drastically improve data-efficiency by using all the collected experience, which empowers learning from real robot experience only. Moreover, the resulting policy improves significantly over the state of the art on a recently proposed real robot manipulation benchmark. Our approach learns end-to-end, directly from pixels, and does not rely on additional human domain knowledge such as a simulator or demonstrations.
翻译:强化学习仅依靠智能体自身生成的数据进行学习,通常被认为在真实机器人上不可行,因为所需的数据量过大。然而,若方法得当,通过重复利用先前收集的次优数据,基于真实数据学习的智能体可以展现出惊人的效率。本文展示了如何通过深化对离策略学习方法的理解,并将其嵌入迭代式在线/离线方案("收集与推断")中,利用所有收集到的经验显著提升数据效率,从而仅依靠真实机器人经验实现学习。此外,所得策略在最近提出的真实机器人操作基准测试中显著超越当前最优方法。我们的方法直接从像素进行端到端学习,不依赖模拟器或演示等额外人类领域知识。