Recent progress in deep reinforcement learning (RL) and computer vision enables artificial agents to solve complex tasks, including locomotion, manipulation and video games from high-dimensional pixel observations. However, domain specific reward functions are often engineered to provide sufficient learning signals, requiring expert knowledge. While it is possible to train vision-based RL agents using only sparse rewards, additional challenges in exploration arise. We present a novel and efficient method to solve sparse-reward robot manipulation tasks from only image observations by utilizing a few demonstrations. First, we learn an embedded neural dynamics model from demonstration transitions and further fine-tune it with the replay buffer. Next, we reward the agents for staying close to the demonstrated trajectories using a distance metric defined in the embedding space. Finally, we use an off-policy, model-free vision RL algorithm to update the control policies. Our method achieves state-of-the-art sample efficiency in simulation and enables efficient training of a real Franka Emika Panda manipulator.
翻译:深度强化学习与计算机视觉的最新进展使智能体能够从高维像素观测中解决复杂任务,包括运动控制、物体操纵及视频游戏。然而,为提供充分的学习信号,通常需要设计领域特定的奖励函数,这依赖于专家知识。虽然仅使用稀疏奖励训练基于视觉的强化学习智能体是可行的,但会引发额外的探索难题。本文提出一种新颖高效的方法,通过利用少量演示,仅从图像观测解决稀疏奖励的机器人操作任务。首先,我们从演示轨迹中学习嵌入神经动力学模型,并利用经验回放缓冲区进行微调。随后,基于嵌入空间中定义的度量距离,对智能体接近演示轨迹的行为给予奖励。最终,采用离策略无模型视觉强化学习算法更新控制策略。本方法在仿真中实现了最先进的样本效率,并成功驱动真实Franka Emika Panda机械臂完成高效训练。