Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide the first lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches the lower bound. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
翻译:固定预算最优臂识别(BAI)是一个多臂老虎机问题,其中智能体在固定观测预算内最大化识别最优臂的概率。本文在贝叶斯框架下研究该问题。我们提出一种贝叶斯淘汰算法,并推导出其误识别最优臂概率的上界。该上界反映了先验信息的质量,是该设定下首个依赖于分布的界。我们采用类似频率学派的论证方法进行证明——在证明过程中保留先验分布,最后再对老虎机实例进行积分。同时,我们给出了2臂贝叶斯老虎机中误识别概率的首个下界,并表明所提出的上界与该下界(几乎)匹配。实验表明,贝叶斯淘汰算法优于频率学派方法,且可与在该设定下无理论保证的最先进贝叶斯算法相媲美。