The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER) significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input. Moreover, despite learning from a human example, the agent is not constrained to human-level performance, often learning a policy that is significantly different from the human demonstration.
翻译:在强化学习中引入人类示范已被证明能显著提升智能体性能。然而,要求人类手动"教授"模型的做法本质上与强化学习的目标相悖。本文尝试在保持性能优势的同时最小化人类参与程度,通过简易虚拟现实仿真采集的单次人类示范来辅助强化学习训练。我们提出的方法对单次示范进行增强,生成大量类人示范,将其与深度确定性策略梯度及后见经验回放(DDPG+HER)结合后,能显著缩短简单任务的训练时间,并使智能体能够解决DDPG+HER单独无法完成的复杂任务(积木堆叠)。该模型仅需不到一分钟的人类输入,即可通过单次人类示范获得显著训练优势。值得注意的是,尽管基于人类示范学习,智能体并不受限于人类水平表现,其习得的策略往往与人类示范存在显著差异。