Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

A hallmark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.

翻译：智能的一个标志是能够展现出广泛的有效行为。受此原则启发，质量多样性算法（如MAP-Elites）是一类旨在生成一组多样且高适应度解决方案的进化方法。然而，作为一种遗传算法，MAP-Elites依赖于随机突变，在高维搜索空间中可能变得低效，从而限制了其向更复杂领域（例如直接从高维输入学习控制智能体）的可扩展性。为解决这一局限性，已开发出如PGA-MAP-Elites和DCG-MAP-Elites等先进方法，它们将强化学习中的演员-评论家技术与MAP-Elites相结合，显著提升了质量多样性算法在复杂高维任务中的性能与效率。尽管这些方法已成功利用训练后的评论家来指导更有效的突变，但训练后的演员在提升进化种群的质量和多样性方面的潜力仍未得到充分利用。在本工作中，我们提出了DCRL-MAP-Elites，它是DCG-MAP-Elites的扩展，利用描述符条件演员作为生成模型来产生多样化的解决方案，并在每一代将这些方案注入子代批次中。此外，我们对每种算法所发现解决方案的适应度和描述符可重现性进行了实证分析。最后，我们提出了第二次实证分析，揭示了不同变异算子之间的协同作用，并解释了从PGA-MAP-Elites到DCRL-MAP-Elites的性能提升原因。