The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering. Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered rewards on some tasks. See our project page (https://sites.google.com/view/iclr24drs) for more details.
翻译:众多强化学习技术的成功高度依赖人工设计的稠密奖励,这通常需要大量领域知识与反复试错。本文提出DrS(基于阶段结构的稠密奖励学习),一种以数据驱动方式为多阶段任务学习可复用稠密奖励的新方法。通过利用任务的阶段结构,DrS能够从稀疏奖励和(若存在)演示中学习高质量稠密奖励。所学奖励可《复用于》未见任务,从而减少奖励工程所需的人力成本。在包含1000余种任务变体的三组物理机器人操作任务上的大量实验表明:所学奖励可复用于未见任务,有效提升强化学习算法的性能与样本效率。在某些任务上,该奖励甚至达到与人工设计奖励相媲美的效果。更多详情见项目主页(https://sites.google.com/view/iclr24drs)。