We investigate fixed-budget best arm identification (BAI) for expected simple regret minimization. In each round of an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and subsequently observes the outcomes of the chosen arm. After the experiment, the decision maker recommends a treatment arm with the highest projected outcome. We evaluate this decision in terms of the expected simple regret, a difference between the expected outcomes of the best and recommended treatment arms. Due to the inherent uncertainty, we evaluate the regret using the minimax criterion. For distributions with fixed variances (location-shift models), such as Gaussian distributions, we derive asymptotic lower bounds for the worst-case expected simple regret. Then, we show that the Random Sampling (RS)-Augmented Inverse Probability Weighting (AIPW) strategy proposed by Kato et al. (2022) is asymptotically minimax optimal in the sense that the leading factor of its worst-case expected simple regret asymptotically matches our derived worst-case lower bound. Our result indicates that, for location-shift models, the optimal RS-AIPW strategy draws treatment arms with varying probabilities based on their variances. This result contrasts with the results of Bubeck et al. (2011), which shows that drawing each treatment arm with an equal ratio is minimax optimal in a bounded outcome setting.
翻译:我们研究了用于期望简单遗憾最小化的固定预算最佳臂识别(BAI)问题。在自适应实验的每一轮中,决策者基于历史观测从多个治疗臂中选择一个进行抽取,随后观察所选臂的结果。实验结束后,决策者推荐一个预期结果最高的治疗臂。我们通过期望简单遗憾——即最佳治疗臂与推荐治疗臂期望结果之差——来评估这一决策。由于固有不确定性,我们采用极小化准则评估遗憾。对于具有固定方差的分布(位置偏移模型),如高斯分布,我们推导了最坏情况下期望简单遗憾的渐近下界。随后,我们证明Kato等人(2022)提出的随机抽样(RS)增强逆概率加权(AIPW)策略是渐近极小化最优的,即其最坏情况期望简单遗憾的主导因子渐近匹配我们推导的最坏情况下界。这一结果表明,对于位置偏移模型,最优的RS-AIPW策略根据各臂的方差以不同概率抽取治疗臂。该结果与Bubeck等人(2011)的结论形成对比,后者表明在有限结果设定下,以等比例抽取每个治疗臂是极小化最优的。