Reinforcement learning has achieved great success in many decision-making tasks, and traditional reinforcement learning algorithms are mainly designed for obtaining a single optimal solution. However, recent works show the importance of developing diverse policies, which makes it an emerging research topic. Despite the variety of diversity reinforcement learning algorithms that have emerged, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments.
翻译:强化学习已在众多决策任务中取得巨大成功,传统强化学习算法主要设计用于获取单一最优解。然而,近期研究揭示了开发多样策略的重要性,这使其成为新兴的研究课题。尽管目前已涌现出多种多样的多样化强化学习算法,但尚无算法从理论上回答收敛性问题及算法效率。本文提出统一的多样化强化学习框架,并研究了多样策略训练的收敛性。在此框架下,我们进一步提出了一种可证明高效的多样化强化学习算法。最后,通过数值实验验证了该方法的有效性。