In a variety of domains, from robotics to finance, Quality-Diversity algorithms have been used to generate collections of both diverse and high-performing solutions. Multi-Objective Quality-Diversity algorithms have emerged as a promising approach for applying these methods to complex, multi-objective problems. However, existing methods are limited by their search capabilities. For example, Multi-Objective Map-Elites depends on random genetic variations which struggle in high-dimensional search spaces. Despite efforts to enhance search efficiency with gradient-based mutation operators, existing approaches consider updating solutions to improve on each objective separately rather than achieving desired trade-offs. In this work, we address this limitation by introducing Multi-Objective Map-Elites with Preference-Conditioned Policy-Gradient and Crowding Mechanisms: a new Multi-Objective Quality-Diversity algorithm that uses preference-conditioned policy-gradient mutations to efficiently discover promising regions of the objective space and crowding mechanisms to promote a uniform distribution of solutions on the non-dominated front. We evaluate our approach on six robotics locomotion tasks and show that our method outperforms or matches all state-of-the-art Multi-Objective Quality-Diversity methods in all six, including two newly proposed tri-objective tasks. Importantly, our method also achieves a smoother set of trade-offs, as measured by newly-proposed sparsity-based metrics.
翻译:在从机器人学到金融的众多领域中,质量多样性算法已被用于生成兼具多样性和高性能的解集合。多目标质量多样性算法已成为将这些方法应用于复杂多目标问题的一种有前景的途径。然而,现有方法受限于其搜索能力。例如,多目标Map-Elites依赖于随机遗传变异,这些变异在高维搜索空间中表现不佳。尽管已有努力通过基于梯度的变异算子来提升搜索效率,但现有方法考虑的是分别更新解以改进每个目标,而非实现期望的权衡。在本工作中,我们通过引入具有偏好条件策略梯度和拥挤机制的多目标Map-Elites来解决这一局限:这是一种新的多目标质量多样性算法,它使用偏好条件策略梯度变异来高效发现目标空间中有前景的区域,并利用拥挤机制来促进非支配前沿上解的均匀分布。我们在六个机器人运动任务上评估了我们的方法,结果表明,在所有六个任务(包括两个新提出的三目标任务)中,我们的方法均优于或匹配所有最先进的多目标质量多样性方法。重要的是,根据新提出的基于稀疏性的度量指标,我们的方法还实现了一组更平滑的权衡。