Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed. Even though curriculum RL, a framework that solves complex tasks by proposing a sequence of surrogate tasks, shows reasonable results, most of the previous works still have difficulty in proposing curriculum due to the absence of a mechanism for obtaining calibrated guidance to the desired outcome state without any prior domain knowledge. To alleviate it, we propose an uncertainty & temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem. It could not only provide precisely calibrated guidance of the curriculum to the desired outcome states but also bring much better sample efficiency and geometry-agnostic curriculum goal proposal capability compared to previous curriculum RL methods. We demonstrate that our algorithm significantly outperforms these prior methods in a variety of challenging navigation tasks and robotic manipulation tasks in a quantitative and qualitative way.
翻译:当前强化学习在解决探索困难的问题时常常遇到挑战,其中期望的结果或高奖励很少被观察到。尽管课程强化学习(一种通过提出一系列替代任务来解决复杂任务的框架)显示出合理的结果,但大多数先前的工作在提出课程时仍然存在困难,原因是缺乏一种无需先验领域知识即可获得校准引导至期望结果状态的机制。为了缓解这一问题,我们提出了一种基于不确定性和时间距离感知的课程目标生成方法,用于结果导向的强化学习,通过解决二分图匹配问题来实现。该方法不仅能够为课程提供精确校准的引导至期望结果状态,而且与先前的课程强化学习方法相比,具有更好的样本效率和与几何无关的课程目标提议能力。我们通过定量和定性的方式证明,我们的算法在多种具有挑战性的导航任务和机器人操作任务中显著优于这些先前的方法。