This study investigates minimax and Bayes optimal strategies for fixed-budget best-arm identification. We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase, and we design an adaptive experiment within this framework to efficiently identify the best arm, defined as the one with the highest expected outcome. In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate samples uniformly across arms to eliminate clearly suboptimal arms and to estimate outcome variances. Before entering the second stage, we solve a Gaussian minimax game, which yields a sampling ratio and a decision rule. In the second stage, samples are allocated according to this sampling ratio. After the sampling phase, the procedure enters the recommendation phase, where we select an arm using the decision rule. We prove that this single strategy is simultaneously asymptotically minimax and Bayes optimal for the simple regret, and we establish upper bounds that coincide exactly with our lower bounds, including the constant terms.
翻译:本研究探讨了固定预算最优臂识别中的极小化极大与贝叶斯最优策略。我们考虑一种由采样阶段和推荐阶段组成的自适应过程,并在此框架内设计自适应实验以高效识别具有最高期望结果的最优臂。在我们提出的策略中,采样阶段包含两个子阶段。第一阶段为探索阶段,我们在各臂间均匀分配样本以剔除明显次优臂并估计结果方差。在进入第二阶段前,我们求解一个高斯极小化极大博弈,该博弈产生一个采样比率和一个决策规则。在第二阶段,样本按照此采样比率进行分配。采样阶段结束后,过程进入推荐阶段,我们使用决策规则选择一个臂。我们证明该单一策略对于简单遗憾同时具有渐近极小化极大与贝叶斯最优性,并建立了与下界(包括常数项)完全吻合的上界。