The majority of research on estimation-of-distribution algorithms (EDAs) concentrates on pseudo-Boolean optimization and permutation problems, leaving the domain of EDAs for problems in which the decision variables can take more than two values, but which are not permutation problems, mostly unexplored. To render this domain more accessible, we propose a natural way to extend the known univariate EDAs to this setting. Different from a naive reduction to the binary case, our approach avoids additional constraints. Since understanding genetic drift is crucial for an optimal parameter choice, we extend the known quantitative analysis of genetic drift to EDAs for multi-valued variables. Roughly speaking, when the variables take $r$ different values, the time for genetic drift to become significant is $r$ times shorter than in the binary case. Consequently, the update strength of the probabilistic model has to be chosen $r$ times lower now. To investigate how desired model updates take place in this framework, we undertake a mathematical runtime analysis on the $r$-valued \leadingones problem. We prove that with the right parameters, the multi-valued UMDA solves this problem efficiently in $O(r\ln(r)^2 n^2 \ln(n))$ function evaluations. This bound is nearly tight as our lower bound $\Omega(r\ln(r) n^2 \ln(n))$ shows. Overall, our work shows that our good understanding of binary EDAs naturally extends to the multi-valued setting, and it gives advice on how to set the main parameters of multi-values EDAs.
翻译:分布估计算法(EDAs)的研究主要集中在伪布尔优化和排列问题领域,而对于决策变量可取两个以上值但非排列问题的EDAs领域研究尚不充分。为便于探索该领域,我们提出了一种将已知单变量EDAs自然扩展到该场景的方法。与简化为二值情况的朴素方法不同,我们的方法避免了额外约束。由于理解遗传漂变对于最优参数选择至关重要,我们将遗传漂变的已知定量分析扩展至面向多值变量的EDAs。粗略而言,当变量可取$r$个不同值时,遗传漂变变得显著所需的时间比二值情况短$r$倍。因此,概率模型的更新强度现在需相应降低$r$倍。为探究该框架中期望的模型更新如何实现,我们对$r$值LeadingOnes问题进行了数学运行时间分析。我们证明,在正确参数设置下,多值UMDA能在$O(r\ln(r)^2 n^2 \ln(n))$次函数评估内高效解决该问题。该界基本紧致,其下界为$\Omega(r\ln(r) n^2 \ln(n))$。总体而言,我们的研究表明,对二值EDAs的深刻理解可自然推广至多值场景,并为多值EDAs主要参数的设置提供了指导。