We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is introducing new proof methods that result in tighter bounds for multi-armed BAI compared to existing methods. We extensively compare our approach to other fixed-budget BAI methods, demonstrating its consistent and robust performance in various settings. Our work improves our understanding of Bayesian fixed-budget BAI in structured bandits and highlights the effectiveness of our approach in practical scenarios.
翻译:我们研究结构化多臂赌博机中贝叶斯固定预算最优臂识别问题。提出一种利用先验信息与环境结构采用固定分配策略的算法,给出了该算法在包括线性与层次化最优臂识别在内的多种模型中的理论性能界,其中首次建立了线性与层次化最优臂识别的先验依赖上界。核心贡献在于引入新的证明方法,使得多臂最优臂识别的界相较于现有方法更为紧凑。通过将所提方法与多种固定预算最优臂识别方法进行广泛对比,验证了该方法在不同场景下的一致稳健性能。本研究深化了对结构化多臂赌博机中贝叶斯固定预算最优臂识别的理解,并凸显了所提方法在实际应用中的有效性。