This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The state-of-the-art algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, a novel framework is proposed, which views the BAI problem as sequential hypothesis testing, and is amenable to tractable analysis for the exponential family of bandits. Based on this framework, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests. This algorithm has three features for both settings: (1) its sample complexity is asymptotically optimal, (2) it is guaranteed to be $\delta-$PAC, and (3) it addresses the computational challenge of the state-of-the-art approaches. Specifically, these approaches, which are focused only on the Gaussian setting, require Thompson sampling from the arm that is deemed the best and a challenger arm. This paper analytically shows that identifying the challenger is computationally expensive and that the proposed algorithm circumvents it. Finally, numerical experiments are provided to support the analysis.
翻译:本文研究固定置信度设置下随机多臂赌博机中的最优臂识别问题。考虑指数族赌博机的通用类别。针对当前最优算法在处理指数族赌博机时面临的计算难题,提出了一种新颖框架,该框架将BAI问题视为序贯假设检验,且适用于指数族赌博机的可解性分析。基于此框架,设计了一种利用规范序贯概率比检验的BAI算法。该算法在两种设置下均具备三个特性:(1)样本复杂度渐近最优;(2)严格保证$\delta-$PAC;(3)解决了当前最优方法存在的计算挑战。具体而言,这些仅适用于高斯设置的方法需要对被视为最优臂和挑战者臂进行汤普森采样。本文通过理论分析表明,识别挑战者臂的计算成本高昂,而所提算法规避了该问题。最后通过数值实验验证了理论分析结果。