We investigate the problem of fixed-budget best arm identification (BAI) for minimizing expected simple regret. In an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and observes the outcome of the drawn arm. After the experiment, the decision maker recommends the treatment arm with the highest expected outcome. We evaluate the decision based on the expected simple regret, which is the difference between the expected outcomes of the best arm and the recommended arm. Due to inherent uncertainty, we evaluate the regret using the minimax criterion. First, we derive asymptotic lower bounds for the worst-case expected simple regret, which are characterized by the variances of potential outcomes (leading factor). Based on the lower bounds, we propose the Two-Stage (TS)-Hirano-Imbens-Ridder (HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in recommending the best arm. Our theoretical analysis shows that the TS-HIR strategy is asymptotically minimax optimal, meaning that the leading factor of its worst-case expected simple regret matches our derived worst-case lower bound. Additionally, we consider extensions of our method, such as the asymptotic optimality for the probability of misidentification. Finally, we validate the proposed method's effectiveness through simulations.
翻译:我们研究固定预算下最佳臂识别(BAI)问题,旨在最小化预期简单遗憾。在自适应实验中,决策者根据过往观测从多个治疗臂中抽取一个,并观测该臂的 outcome 结果。实验结束后,决策者推荐期望 outcome 最高的治疗臂。我们通过预期简单遗憾(即最优臂与推荐臂期望 outcome 之差)评估决策。由于内在不确定性,我们采用极小化极大准则评估遗憾。首先,我们推导了最坏情况下预期简单遗憾的渐近下界,该下界由潜在 outcome 的方差(主导因子)刻画。基于该下界,我们提出两阶段(TS)-Hirano-Imbens-Ridder(HIR)策略,该策略在推荐最佳臂时使用 HIR 估计量(Hirano 等人,2003)。理论分析表明,TS-HIR 策略是渐近极小化极大最优的,即其最坏情况下预期简单遗憾的主导因子与所推导的最坏情况下的下界匹配。此外,我们考虑了方法的扩展,例如误识别概率的渐近最优性。最后,通过仿真验证了所提方法的有效性。