Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.
翻译:分布估计算法(EDAs)是一类通过学习搜索空间上的概率分布来轻松采样优质解的优化算法。大多数EDA的关键参数是样本量(种群规模)。若种群规模过小,概率模型的更新仅基于少量样本,会导致遗传漂变这一非期望效应;而种群规模过大虽能避免遗传漂变,却会拖慢优化进程。基于近期关于种群规模如何导致遗传漂变的定量分析,我们为EDA设计了一种智能重启机制。通过在高遗传漂变风险时终止运行,该机制自动将EDA控制在良好的参数区间内。通过数学运行时间分析,我们证明了这种智能重启方案的通用性能保证。该结果特别表明:在许多已知最优问题特定参数值的情景下,重启机制能自动找到这些参数,从而获得渐近最优性能。我们还开展了广泛的实验分析。在四个经典基准测试问题上,我们清晰观察到种群规模对性能的关键影响,并发现智能重启方案能获得接近最优参数值下的性能。实验结果同时显示,先前基于理论的最优种群规模建议可能与实际最优值存在显著偏差,导致其性能明显低于智能重启方案。此外,我们采用PBIL(交叉熵算法)对文献中的两个组合优化问题(最大割问题和二分图划分问题)进行了实验。结果再次表明,智能重启机制能自动找到比文献建议值更优的种群规模参数,从而获得显著更优的性能。