Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
翻译:固定预算最佳臂识别(BAI)是一个多臂赌博机问题,其中智能体在固定预算观测次数内最大化识别最优臂的概率。本文在贝叶斯框架下研究该问题。我们提出一种贝叶斯淘汰算法,并推导出其误判最优臂概率的上界。该上界反映了先验分布的质量,是该设置下首个依赖于分布的上界。我们采用类似频率派的论证方法进行证明,过程中保留先验分布,并在最终对赌博机实例进行积分。同时,我们给出了2臂贝叶斯赌博机中误判概率的下界,并证明我们的上界与任何预算下的下界(几乎)匹配。实验表明,贝叶斯淘汰算法优于频率派方法,并与当前无理论保证的贝叶斯算法性能相当。