When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES, an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity with Novelty, a commonly used diversity metric, and find that Curiosity can generate higher diversity over full episodes without the need for an explicit diversity criterion and lead to multiple policies which find reward.
翻译:在搜索策略时,奖励稀疏的环境往往缺乏足够的信息来指示哪些行为应该改进或避免。在此类环境中,策略搜索过程必然盲目地寻找产生奖励的转移,而早期奖励无法将这一搜索偏向某一方向。克服这一问题的一种方法是使用内在动机来探索新的转移,直至找到奖励。在本工作中,我们采用一种近期提出的内在动机定义——好奇心,应用于进化策略搜索方法。我们提出了Curiosity-ES,这是一种适应性地将好奇心作为适应度指标的进化策略。我们将好奇心与常用多样性指标新奇性进行比较,发现好奇心无需显式的多样性准则即可在完整回合中生成更高的多样性,并引导出多种能够找到奖励的策略。