Objectives Are All You Need: Solving Deceptive Problems Without Explicit Diversity Maintenance

Navigating deceptive domains has often been a challenge in machine learning due to search algorithms getting stuck at sub-optimal local optima. Many algorithms have been proposed to navigate these domains by explicitly maintaining diversity or equivalently promoting exploration, such as Novelty Search or other so-called Quality Diversity algorithms. In this paper, we present an approach with promise to solve deceptive domains without explicit diversity maintenance by optimizing a potentially large set of defined objectives. These objectives can be extracted directly from the environment by sub-aggregating the raw performance of individuals in a variety of ways. We use lexicase selection to optimize for these objectives as it has been shown to implicitly maintain population diversity. We compare this technique with a varying number of objectives to a commonly used quality diversity algorithm, MAP-Elites, on a set of discrete optimization as well as reinforcement learning domains with varying degrees of deception. We find that decomposing objectives into many objectives and optimizing them outperforms MAP-Elites on the deceptive domains that we explore. Furthermore, we find that this technique results in competitive performance on the diversity-focused metrics of QD-Score and Coverage, without explicitly optimizing for these things. Our ablation study shows that this technique is robust to different subaggregation techniques. However, when it comes to non-deceptive, or ``illumination" domains, quality diversity techniques generally outperform our objective-based framework with respect to exploration (but not exploitation), hinting at potential directions for future work.

翻译：在机器学习中，由于搜索算法容易陷入次优局部最优，导航欺骗性领域一直面临挑战。许多算法通过显式维护多样性或等效地促进探索（如新颖性搜索或其他所谓的质量多样性算法）来应对这些领域。本文提出一种方法，无需显式维护多样性，通过优化可能较大的定义目标集合来解决欺骗性领域。这些目标可通过多种方式对个体的原始表现进行子聚合，直接从环境中提取。我们采用词汇选择法优化这些目标，因其已被证明能隐式维护种群多样性。我们将该技术与不同数量的目标进行对比，在包含不同欺骗程度的离散优化及强化学习领域上，与常用的质量多样性算法MAP-Elites进行比较。结果发现：将目标分解为多个子目标并加以优化，在我们探索的欺骗性领域中优于MAP-Elites。此外，该技术在不显式优化多样性指标的情况下，在QD分数和覆盖率等多样性聚焦指标上仍具有竞争力。消融研究表明，该技术对不同子聚合技术具有鲁棒性。然而，对于非欺骗性或“照明”领域，质量多样性技术在探索（而非利用）方面通常优于我们的目标框架，这为未来工作指明了潜在方向。