Reinforcement learning has shown great potential in solving complex tasks when large amounts of data can be generated with little effort. In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles. However, for tasks that, for instance, involve complex soft robots, devising such models is substantially more challenging. Being able to train effectively in increasingly complicated scenarios with reinforcement learning enables to take advantage of complex systems such as soft robots. Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently. We (i) abstract the task into distinct components, (ii) off-load the simple dynamics parts into the simulation, and (iii) multiply these virtual parts to generate more data in hindsight. Our new method, Hindsight States (HiS), uses this data and selects the most useful transitions for training. It can be used with an arbitrary off-policy algorithm. We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm, Hindsight Experience Replay (HER). Finally, we evaluate HiS on a physical system and show that it boosts performance on a complex table tennis task with a muscular robot. Videos and code of the experiments can be found on webdav.tuebingen.mpg.de/his/.
翻译:强化学习在能够轻松生成大量数据时,展现出解决复杂任务的巨大潜力。在机器人学中,一种生成训练数据的方法基于由第一性原理导出的动力学模型构建仿真。然而,对于涉及例如复杂软体机器人的任务,设计此类模型则极具挑战。在日益复杂的场景中通过强化学习进行有效训练,能够利用软体机器人等复杂系统的优势。在此,我们利用动力学复杂度的不平衡性来提高样本效率。我们(i)将任务抽象为不同组件,(ii)将简单动力学部分卸载至仿真中,以及(iii)通过事后方式倍增这些虚拟部分以生成更多数据。我们的新方法——事后状态(HiS),利用这些数据并选择对训练最有用的转换。该方法可与任意离策略算法结合使用。我们在多个具有挑战性的模拟任务上验证了该方法,并证明其在独立使用或与现有事后算法——事后经验回放(HER)结合时,均能提升学习效果。最后,我们在物理系统上评估了HiS,并展示其在涉及肌肉型机器人的复杂乒乓球任务中提升了性能。实验视频和代码可在webdav.tuebingen.mpg.de/his/获取。