The Reinforcement Learning field is strong on achievements and weak on reapplication; a computer playing GO at a super-human level is still terrible at Tic-Tac-Toe. This paper asks whether the method of training networks improves their generalization. Specifically we explore core quality diversity algorithms, compare against two recent algorithms, and propose a new algorithm to deal with shortcomings in existing methods. Although results of these methods are well below the performance hoped for, our work raises important points about the choice of behavior criterion in quality diversity, the interaction of differential and evolutionary training methods, and the role of offline reinforcement learning and randomized learning in evolutionary search.
翻译:强化学习领域在成果上成就斐然,但在复用性上却存在短板——能击败人类顶尖棋手的围棋AI,在井字棋上却表现糟糕。本文探究网络训练方法是否有助于提升其泛化能力。具体而言,我们深入研究了核心质量多样性算法,与两种近期算法进行了对比,并针对现有方法的不足提出了一种新算法。尽管这些方法的效果远低于预期水平,但我们的工作揭示了质量多样性中行为准则选择、微分训练与进化训练方法的交互作用,以及离线强化学习和随机学习在进化搜索中的核心地位等关键问题。