Imitation learning is a primary approach to improve the efficiency of reinforcement learning by exploiting the expert demonstrations. However, in many real scenarios, obtaining expert demonstrations could be extremely expensive or even impossible. To overcome this challenge, in this paper, we propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents themselves without expert demonstration. Specifically, we train two different agents via letting each of them alternately explore the environment and exploit the peer agent's experience. While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function. Thus the agents can selectively imitate from each other by emphasizing the more useful experiences while filtering out noisy ones. Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework, validating that the agents can benefit from each other without external supervision.
翻译:模仿学习是通过利用专家示范来提高强化学习效率的主要方法。然而,在许多真实场景中,获取专家示范可能极其昂贵甚至不可能。为克服这一挑战,本文提出一种名为协同模仿学习(CoIL)的新型学习框架,无需专家示范即可利用智能体自身过往的良好经验。具体而言,我们训练两个不同的智能体,让它们交替探索环境并利用同伴智能体的经验。由于这些经验可能具有价值或误导性,我们提出通过价值函数的预期增益来估计每条经验的潜在效用。因此,智能体能够通过强调更有用的经验并过滤噪声经验,有选择地相互模仿。在各种任务上的实验结果表明,所提出的协同模仿学习框架具有显著优势,验证了智能体无需外部监督即可相互受益。