A suite of diagnostic metrics for characterizing selection schemes

Benchmark suites are crucial for assessing the performance of evolutionary algorithms, but the constituent problems are often too complex to provide clear intuition about an algorithm's strengths and weaknesses. To address this gap, we introduce DOSSIER ("Diagnostic Overview of Selection Schemes In Evolutionary Runs"), a diagnostic suite initially composed of eight handcrafted metrics. These metrics are designed to empirically measure specific capacities for exploitation, exploration, and their interactions. We consider exploitation both with and without constraints, and we divide exploration into two aspects: diversity exploration (the ability to simultaneously explore multiple pathways) and valley-crossing exploration (the ability to cross wider and wider fitness valleys). We apply DOSSIER to six popular selection schemes: truncation, tournament, fitness sharing, lexicase, nondominated sorting, and novelty search. Our results confirm that simple schemes (e.g., tournament and truncation) emphasized exploitation. For more sophisticated schemes, however, our diagnostics revealed interesting dynamics. Lexicase selection performed moderately well across all diagnostics that did not incorporate valley crossing, but faltered dramatically whenever valleys were present, performing worse than even random search. Fitness sharing was the only scheme to effectively contend with valley crossing but it struggled with the other diagnostics. Our study highlights the utility of using diagnostics to gain nuanced insights into selection scheme characteristics, which can inform the design of new selection methods.

翻译：基准测试套件对评估进化算法的性能至关重要，但其中的问题往往过于复杂，难以清晰揭示算法的优势与不足。为弥补这一空白，我们提出了DOSSIER（“进化运行中选择策略的诊断概览”），这是一个初始包含八个手工设计指标的诊断套件。这些指标旨在实证测量开发能力、探索能力及其相互作用。我们分别考虑有无约束条件下的开发能力，并将探索能力分为两个方面：多样性探索（同时探索多条通路的能力）与跨越谷地探索（跨越越来越宽适应度谷地的能力）。我们应用DOSSIER对六种主流选择策略进行了评估：截断选择、锦标赛选择、适应度共享、词汇法选择、非支配排序以及新奇性搜索。结果表明，简单策略（如锦标赛选择与截断选择）强调开发能力。然而，对于更复杂的策略，我们的诊断揭示了有趣的动态特性。词汇法选择在所有不涉及跨越谷地能力的诊断中表现中等，但在存在谷地时表现急剧恶化，甚至不如随机搜索。适应度共享是唯一有效应对跨越谷地能力的策略，但在其他诊断中表现欠佳。本研究凸显了利用诊断手段深入洞察选择策略特性的价值，可为新型选择方法的设计提供依据。