Reinforcement learning has shown great potential in solving complex tasks when large amounts of data can be generated with little effort. In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. However, for tasks that, for instance, involve complex soft robots, devising such models is substantially more challenging. Being able to train effectively in increasingly complicated scenarios with reinforcement learning enables to take advantage of complex systems such as soft robots. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We (i) abstract the task into distinct components, (ii) off-load the simple dynamics parts into the simulation, and (iii) multiply these virtual parts to generate more data in hindsight. Our new method, Hindsight States (HiS), uses this data and selects the most useful transitions for training. It can be used with an arbitrary off-policy algorithm. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm, Hindsight Experience Replay (HER). Finally, we evaluate HiS on a physical system and show that it boosts performance on a complex table tennis task with a muscular robot. Videos and code of the experiments can be found on webdav.tuebingen.mpg.de/his/.
翻译:强化学习在可轻松生成大量数据时展现出解决复杂任务的巨大潜力。在机器人学中,一种生成训练数据的方法基于由第一性原理导出的动力学模型仿真。然而,对于涉及例如复杂软体机器人的任务,设计此类模型变得极具挑战。在日益复杂场景中有效训练强化学习模型,能够充分利用软体机器人等复杂系统的优势。本文利用动力学复杂度的不均衡性实现更高效的样本学习。我们(i)将任务抽象为不同组成部分,(ii)将简单动力学部分迁移至仿真环境,并(iii)通过增加这些虚拟部分的多样性以在事后生成更多数据。我们提出的新方法——后见状态(HiS),利用这些数据并选择最有价值的转换用于训练。该方法可与任意离策略算法结合使用。我们在多个具有挑战性的仿真任务上验证了该方法,证明其无论是在单独使用还是与现有后见算法(后见经验重放,HER)结合时均能提升学习效果。最后,我们在物理系统上评估HiS,并表明其在涉及肌肉机器人的复杂乒乓球任务中显著提升性能。实验视频与代码可通过webdav.tuebingen.mpg.de/his/获取。