We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$. We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=Ω(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-Θ(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.
翻译:我们考虑经典的多臂赌博机设置,以理解不同搜索启发式算法在探索与利用之间的权衡。由于许多搜索启发式算法通过比较不同选项(在进化算法中称为“个体”;在赌博机文献中称为“臂”)来工作,我们采用“对决赌博机”设置。在每次迭代中,可以对不同臂进行成对比较;在二元随机设置中,每个臂与其他任意臂相比具有固定的获胜概率。康多塞胜者是指以严格高于$1/2$的概率击败所有其他臂的臂。我们证明,进化算法在识别康多塞胜者方面表现较差:即使康多塞胜者以概率$1-p$击败其他所有臂,(1+1) EA在其平稳分布中仅以常数概率选择康多塞胜者,当$p=Ω(1/n)$时。相比之下,我们证明一个简单的EDA(基于最大最小蚂蚁系统与迭代最优更新)在其维护的分布中以概率$1-Θ(p)$选择康多塞胜者。作为(1+1) EA的补救措施,我们展示了重复对决如何能显著提升平稳分布中康多塞胜者的概率。