Quality-Diversity (QD) algorithms have recently gained traction as optimisation methods due to their effectiveness at escaping local optima and capability of generating wide-ranging and high-performing solutions. Recently, Multi-Objective MAP-Elites (MOME) extended the QD paradigm to the multi-objective setting by maintaining a Pareto front in each cell of a map-elites grid. MOME achieved a global performance that competed with NSGA-II and SPEA2, two well-established Multi-Objective Evolutionary Algorithms (MOEA), while also acquiring a diverse repertoire of solutions. However, MOME is limited by non-directed genetic search mechanisms which struggle in high-dimensional search spaces. In this work, we present Multi-Objective MAP-Elites with Policy-Gradient Assistance and Crowding-based Exploration (MOME-PGX): a new QD algorithm that extends MOME to improve its data efficiency and performance. MOME-PGX uses gradient-based optimisation to efficiently drive solutions towards higher performance. It also introduces crowding-based mechanisms to create an improved exploration strategy and to encourage uniformity across Pareto fronts. We evaluate MOME-PGX in four simulated robot locomotion tasks and demonstrate that it converges faster and to a higher performance than all other baselines. We show that MOME-PGX is between 4.3 and 42 times more data-efficient than MOME and doubles the performance of MOME, NSGA-II and SPEA2 in challenging environments.
翻译:质量多样性算法因其摆脱局部最优的有效性及生成广泛高性能解的能力,近年作为优化方法备受关注。近期,多目标MAP-Elites通过在每个网格细胞中维护帕累托前沿,将质量多样性范式扩展至多目标场景。该算法在与两大经典多目标进化算法NSGA-II和SPEA2的全局性能竞争中表现优异,同时能够获取多样化的解集合。然而,MOME受限于非定向遗传搜索机制,在高维搜索空间中表现欠佳。本文提出基于策略梯度辅助与拥挤探索的多目标MAP-Elites:一种旨在提升数据效率与性能的新型质量多样性算法。MOME-PGX通过梯度优化高效驱动解向更高性能演进,并引入基于拥挤度的机制构建改进探索策略,促进帕累托前沿的均匀分布。我们在四项仿真机器人运动任务中评估MOME-PGX,证明其相比所有基线方法具有更快的收敛速度与更优的性能表现。实验表明,MOME-PGX的数据效率较MOME提升4.3至42倍,在复杂环境中性能达到MOME、NSGA-II及SPEA2的两倍。