We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we first introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof then proceeds by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}.
翻译:我们研究随机双臂伯努利奖励老虎机中固定预算下的最优臂识别问题。我们证明不存在这样的算法:(i) 在所有实例上表现均不低于等概率采样各臂的算法(称为{\it 均匀采样}算法),且(ii) 至少在某个实例上严格优于该算法。简言之,不存在比均匀采样算法更优的算法。为证明此结论,我们首先引入{\it 一致性}与{\it 稳定性}算法的自然类别,并表明任何在所有实例上表现不低于均匀采样算法的算法均属于此类。随后通过推导任何一致且稳定算法所满足的错误率下界,并证明均匀采样算法达到该下界,完成证明。我们的结果为文献\cite{qin2022open}中提出的两个开放问题提供了解决方案。