From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms

Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.

翻译：估计分布算法（EDAs）是一类通过从搜索空间中学习分布来便捷采样优质解的优化算法。大多数EDAs的关键参数是样本量（种群规模）。若种群规模过小，概率模型的更新仅依赖于少量样本，会导致遗传漂变这一不良效应；而种群规模过大虽能避免遗传漂变，但会降低算法效率。基于近期关于种群规模如何引发遗传漂变的定量分析，我们为EDAs设计了一种智能重启机制：通过在遗传漂变风险较高时终止运行，该机制使EDA自动处于良好的参数区间。通过数学运行时间分析，我们证明了该智能重启方案具有通用性能保证。这特别表明，在已知最优（问题特定）参数值的许多情形中，重启方案能自动发现这些参数值，从而实现渐近最优性能。我们还进行了大量实验分析。在四个经典基准问题上，我们清晰观察到种群规模对性能的关键影响，并发现智能重启方案可达到接近最优参数值所能实现的性能水平。研究结果同时表明，先前基于理论的最优种群规模建议可能与实际最优值存在显著偏差，导致性能明显劣于智能重启方案。我们进一步将PBIL（交叉熵算法）应用于文献中的两个组合优化问题——最大割问题和二分划分问题开展实验。再次观察到，智能重启机制能发现远优于文献建议值的种群规模参数，从而获得显著更优的性能表现。