With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.
翻译:随着许多领域中快速且大规模并行评估的发展,质量多样性(QD)算法已在广泛应用中展现出巨大潜力,其能力进一步提升。然而,我们仍需理解如何最优地利用大量评估资源——单纯将其用于随机变异并非总是有效。高维搜索空间是典型困境,随机变异难以在此有效搜索;另一类情境则是不确定性环境,其中解可能看似优于实际值,而简单增加评估数量可能误导QD算法。本文提出MAP-Elites-Multi-ES(MEMES),一种基于进化策略(ES)的新型QD算法,旨在更高效地利用快速并行评估。MEMES在单个GPU上同时维护多个(多达100个)ES进程,每个进程拥有独立的优化目标和专为QD优化设计的重置机制。实验表明,MEMES在黑盒优化和QD强化学习任务中均优于基于梯度与基于变异的QD算法,证明了其跨领域优势。此外,在不确定性环境下,给定相同评估预算时,我们的方法优于基于采样的QD方法。总体而言,MEMES通过可访问硬件上的大规模ES优化,生成可复现的高性能且多样化解。