Practitioners conducting adaptive experiments often encounter two competing priorities: reducing the cost of experimentation by effectively assigning treatments during the experiment itself, and gathering information swiftly to conclude the experiment and implement a treatment across the population. Currently, the literature is divided, with studies on regret minimization addressing the former priority in isolation, and research on best-arm identification focusing solely on the latter. This paper proposes a unified model that accounts for both within-experiment performance and post-experiment outcomes. We then provide a sharp theory of optimal performance in large populations that unifies canonical results in the literature. This unification also uncovers novel insights. For example, the theory reveals that familiar algorithms, like the recently proposed top-two Thompson sampling algorithm, can be adapted to optimize a broad class of objectives by simply adjusting a single scalar parameter. In addition, the theory reveals that enormous reductions in experiment duration can sometimes be achieved with minimal impact on both within-experiment and post-experiment regret.
翻译:在进行自适应实验时,实践者常面临两个相互竞争的目标:在实验过程中通过有效分配处理来降低实验成本,以及快速收集信息以结束实验并在人群中实施处理方案。当前,相关文献存在分野——关于遗憾最小化的研究孤立地关注前者,而关于最优臂识别的研究则只聚焦于后者。本文提出一个统一模型,同时考虑实验内表现与实验后结果。我们随后给出针对大规模人群的最优性能的精确理论,该理论统一了文献中的经典结论。这一统一还揭示了新的见解。例如,理论表明,诸如最近提出的top-two汤普森采样算法等常见算法,只需调整单个标量参数即可优化广泛的目标类别。此外,理论揭示,有时可以通过对实验内和实验后遗憾产生最小影响来实现实验时长的大幅缩减。