Model-based reinforcement learning is a promising learning strategy for practical robotic applications due to its improved data-efficiency versus model-free counterparts. However, current state-of-the-art model-based methods rely on shaped reward signals, which can be difficult to design and implement. To remedy this, we propose a simple model-based method tailored for sparse-reward multi-goal tasks that foregoes the need for complicated reward engineering. This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates. To improve exploration in the sparse-reward setting, the policy is trained with standard Hindsight Experience Replay and endowed with curiosity-based intrinsic rewards. Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
翻译:基于模型的强化学习因其比无模型方法更高的数据效率,成为实际机器人应用中具有前景的学习策略。然而,当前最先进的基于模型方法依赖经过设计的奖励信号,这类设计往往难以实施。为解决该问题,我们提出了一种面向稀疏奖励多目标任务的简单模型方法,无需复杂的奖励工程。该方法被称为"想象型后见经验回放",通过将想象数据纳入策略更新中,最小化真实世界交互。为提升稀疏奖励场景下的探索能力,策略采用标准后见经验回放进行训练,并赋予好奇心驱动的内在奖励。实验评估显示,在OpenAI Gym Fetch Robotics基准任务中,该方法相较最先进的无模型方法,平均数据效率提升了一个数量级。