This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not overly accurate to an unnecessary extent that creates excessive computational complexity (hence, maintaining efficiency). Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions). The optimality is established analytically, and numerical evaluations are provided to assess the analytical guarantees and compare the performance with those of the existing ones.
翻译:本文研究了固定置信度设置下随机多臂老虎机中最优臂识别(BAI)中一个迄今未被探讨的方面。评估Bandit算法的两个关键指标是计算效率与性能最优性(例如样本复杂度)。在随机BAI文献中,已出现设计实现最优性能算法的进展,但这些算法通常计算成本高昂(例如基于优化的方法)。同时存在具有高计算效率的方法,但其与最优性能之间存在可证明的差距(例如二选一方法中的$\beta$-最优方法)。本文提出一个BAI框架及算法,通过一组计算高效的决策规则实现最优性能。实现这一目标的核心过程是顺序估计最优分配直至足够保真度的常规程序。具体而言,这些估计的精度足以识别最优臂(从而实现最优性),但不会过度精确至不必要程度导致计算复杂度过高(从而维持效率)。此外,现有相关文献主要关注指数分布族。本文考虑更一般的设置,即任意由均值参数化的分布族(在温和正则性条件下)。通过解析方法建立了最优性,并提供数值评估以检验解析保证并与现有方法进行性能比较。