Although reinforcement learning has seen tremendous success recently, this kind of trial-and-error learning can be impractical or inefficient in complex environments. The use of demonstrations, on the other hand, enables agents to benefit from expert knowledge rather than having to discover the best action to take through exploration. In this survey, we discuss the advantages of using demonstrations in sequential decision making, various ways to apply demonstrations in learning-based decision making paradigms (for example, reinforcement learning and planning in the learned models), and how to collect the demonstrations in various scenarios. Additionally, we exemplify a practical pipeline for generating and utilizing demonstrations in the recently proposed ManiSkill robot learning benchmark.
翻译:尽管强化学习近年来取得了巨大成功,但这类试错学习在复杂环境中可能不切实际或效率低下。而利用示范数据,智能体无需通过探索发现最优动作,即可从专家知识中获益。本综述探讨了示范在序列决策中的优势、如何在基于学习的决策范式(例如强化学习和基于学习模型的规划)中应用示范,以及不同场景下示范的收集方法。此外,我们以最新提出的ManiSkill机器人学习基准为例,展示了生成和利用示范数据的实用流程。