Sharp Bounds for Genetic Drift in Estimation of Distribution Algorithms

Estimation of Distribution Algorithms (EDAs) are one branch of Evolutionary Algorithms (EAs) in the broad sense that they evolve a probabilistic model instead of a population. Many existing algorithms fall into this category. Analogous to genetic drift in EAs, EDAs also encounter the phenomenon that updates of the probabilistic model not justified by the fitness move the sampling frequencies to the boundary values. This can result in a considerable performance loss. This paper proves the first sharp estimates of the boundary hitting time of the sampling frequency of a neutral bit for several univariate EDAs. For the UMDA that selects $\mu$ best individuals from $\lambda$ offspring each generation, we prove that the expected first iteration when the frequency of the neutral bit leaves the middle range $[\tfrac 14, \tfrac 34]$ and the expected first time it is absorbed in 0 or 1 are both $\Theta(\mu)$. The corresponding hitting times are $\Theta(K^2)$ for the cGA with hypothetical population size $K$. This paper further proves that for PBIL with parameters $\mu$, $\lambda$, and $\rho$, in an expected number of $\Theta(\mu/\rho^2)$ iterations the sampling frequency of a neutral bit leaves the interval $[\Theta(\rho/\mu),1-\Theta(\rho/\mu)]$ and then always the same value is sampled for this bit, that is, the frequency approaches the corresponding boundary value with maximum speed. For the lower bounds implicit in these statements, we also show exponential tail bounds. If a bit is not neutral, but neutral or has a preference for ones, then the lower bounds on the times to reach a low frequency value still hold. An analogous statement holds for bits that are neutral or prefer the value zero.

翻译：基于模型估计算法（EDAs）是进化算法（EAs）的一个分支，其广义特征在于通过更新概率模型而非种群进行演化。现有诸多算法均属此范畴。与EAs中的遗传漂变类似，EDAs同样面临一个现象：未受适应度驱动的概率模型更新会使采样频率向边界值偏移，进而导致显著性能损失。本文首次针对若干单变量EDA中中性位的采样频率，提出了边界击破时间的紧确估计。对于每代从λ个后代中选择μ个最优个体的UMDA算法，我们证明中性位频率首次离开中间区间[¼, ¾]的期望迭代次数以及首次吸收至0或1的期望时间均为Θ(μ)。对于假设种群规模为K的cGA算法，相应的击破时间为Θ(K²)。本文进一步证明，对于参数为μ、λ、ρ的PBIL算法，中性位采样频率在期望Θ(μ/ρ²)次迭代后离开区间[Θ(ρ/μ),1-Θ(ρ/μ)]，随后该位将始终采样相同值——即频率以最大速度逼近对应边界值。针对这些结论中隐含的下界，我们还给出了指数形式的尾界。若某个位非中性（偏好1但非绝对中性），则其达到低频值的时间下界仍然成立。对于中性或偏好0的位，类似结论同样适用。