Quality-Diversity (QD) algorithms have recently gained traction as optimisation methods due to their effectiveness at escaping local optima and capability of generating wide-ranging and high-performing solutions. Recently, Multi-Objective MAP-Elites (MOME) extended the QD paradigm to the multi-objective setting by maintaining a Pareto front in each cell of a map-elites grid. MOME achieved a global performance that competed with NSGA-II and SPEA2, two well-established Multi-Objective Evolutionary Algorithms (MOEA), while also acquiring a diverse repertoire of solutions. However, MOME is limited by non-directed genetic search mechanisms which struggle in high-dimensional search spaces. In this work, we present Multi-Objective MAP-Elites with Policy-Gradient Assistance and Crowding-based Exploration (MOME-PGX): a new QD algorithm that extends MOME to improve its data efficiency and performance. MOME-PGX uses gradient-based optimisation to efficiently drive solutions towards higher performance. It also introduces crowding-based mechanisms to create an improved exploration strategy and to encourage uniformity across Pareto fronts. We evaluate MOME-PGX in four simulated robot locomotion tasks and demonstrate that it converges faster and to a higher performance than all other baselines. We show that MOME-PGX is between 4.3 and 42 times more data-efficient than MOME and doubles the performance of MOME, NSGA-II and SPEA2 in challenging environments.
翻译:质量多样性算法近年来作为优化方法受到关注,因其在逃离局部最优解方面具有高效性,并能生成广泛且高性能的解集。近期提出的多目标MAP-Elites算法通过在每个精英网格单元中维护帕累托前沿,将质量多样性范式扩展至多目标场景。该算法在全局性能上可与两个经典多目标进化算法——NSGA-II和SPEA2相媲美,同时还能获得多样化的解决方案库。然而,MOME受限于非定向遗传搜索机制,在高维搜索空间中表现不佳。本文提出带策略梯度辅助与拥挤探索的多目标MAP-Elites:一种新的质量多样性算法,通过扩展MOME提升其数据效率与性能。MOME-PGX利用基于梯度的优化有效驱动解决方案向更高性能迈进,同时引入基于拥挤度的机制,构建改进的探索策略并促进帕累托前沿的均匀性。我们在四个模拟机器人运动任务中评估了MOME-PGX,证明其收敛速度与最终性能均优于所有基线方法。实验表明,MOME-PGX的数据效率比MOME高出4.3至42倍,且在复杂环境下其性能达到MOME、NSGA-II和SPEA2的两倍。