高效纯探索的双导向算法设计 (Dual-Directed Algorithm Design for Efficient Pure Exploration)

from arxiv, An earlier version of this paper appeared as an extended abstract in the Proceedings of the 36th Annual Conference on Learning Theory, COLT'23, with the title "Information-Directed Selection for Top-Two Algorithms.''

We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternatives. The central objective is to answer a query regarding the alternatives with high confidence while minimizing measurement efforts. One canonical example is identifying the best-performing alternative, a problem known as ranking and selection in simulation or best-arm identification in machine learning. We formulate the problem complexity measure as a maximin optimization problem for the static fixed-budget, fixed-confidence, and posterior convergence rate settings. By incorporating dual variables directly into the analysis, we derive necessary and sufficient conditions for an allocation's optimality. The introduction of dual variables allows us to sidestep the combinatorial complexity that arises when considering only primal variables. These optimality conditions enable the extension of the top-two algorithm design principle to more general pure-exploration problems. Moreover, our analysis yields a straightforward and effective information-directed selection rule that adaptively chooses from a candidate set based on the informational value of the candidates. We demonstrate the broad range of contexts in which our design principle can be implemented. In particular, when combined with information-directed selection, top-two Thompson sampling achieves asymptotic optimality in Gaussian best-arm identification, resolving a notable open question in the pure-exploration literature. Our algorithm attains optimality in $\varepsilon$-best-arm identification (or ranking and selection with a probability of good selection guarantee) and thresholding bandits. Our results provide a general principle for adapting Thompson sampling to general pure-exploration problems. Numerical experiments highlight the efficiency of our proposed algorithms compared to existing methods.

翻译：本文研究有限备选方案下随机序贯自适应实验中的纯探索问题。核心目标是以高置信度回答关于备选方案的查询，同时最小化测量成本。一个典型示例是识别性能最佳的备选方案，该问题在仿真领域称为排序与选择，在机器学习中称为最优臂识别。我们将问题复杂度测度构建为静态固定预算、固定置信度及后验收敛速率设定下的极大极小优化问题。通过将对偶变量直接纳入分析，我们推导出分配方案最优性的必要与充分条件。引入对偶变量使我们能够规避仅考虑原始变量时产生的组合复杂性。这些最优性条件使得前二算法设计原则能够推广至更一般的纯探索问题。此外，我们的分析产生了一种直观有效的信息导向选择规则，该规则根据候选方案的信息价值自适应地从候选集中进行选择。我们展示了该设计原则可实施的广泛情境。特别地，当结合信息导向选择时，前二汤普森采样在高斯最优臂识别中实现了渐近最优性，解决了纯探索文献中一个值得关注的开问题。我们的算法在ε-最优臂识别（或具有良好选择概率保证的排序与选择）及阈值赌博机问题中均达到最优性。研究结果为将汤普森采样适配至一般纯探索问题提供了通用原则。数值实验表明，相较于现有方法，我们提出的算法具有显著效率优势。