This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.
翻译:本文研究了固定置信度下随机多臂赌博机中的最优臂识别(BAI)问题。我们考虑了指数族赌博机的通用类别。现有针对指数族赌博机的算法面临计算挑战。为缓解这些挑战,本文将BAI问题视为并分析为一种序列复合假设检验任务,并提出了一种框架,该框架采用已知在序列检验中有效的似然比检验。基于该检验统计量,我们设计了一种BAI算法,该算法利用经典序列概率比检验进行臂选择,并适用于指数族赌博机的可解分析。该算法具有两个关键特征:(1)其样本复杂度渐近最优,(2)保证为$\delta-$PAC。现有高效方法侧重于高斯设定,并需要对最优臂和挑战臂进行汤普森采样。此外,本文定量分析了现有方法中识别挑战臂的计算开销。最后,通过数值实验支持了理论分析。