Top-$2$ methods have become popular in solving the best arm identification (BAI) problem. The best arm, or the arm with the largest mean amongst finitely many, is identified through an algorithm that at any sequential step independently pulls the empirical best arm, with a fixed probability $\beta$, and pulls the best challenger arm otherwise. The probability of incorrect selection is guaranteed to lie below a specified $\delta >0$. Information theoretic lower bounds on sample complexity are well known for BAI problem and are matched asymptotically as $\delta \rightarrow 0$ by computationally demanding plug-in methods. The above top 2 algorithm for any $\beta \in (0,1)$ has sample complexity within a constant of the lower bound. However, determining the optimal $\beta$ that matches the lower bound has proven difficult. In this paper, we address this and propose an optimal top-2 type algorithm. We consider a function of allocations anchored at a threshold. If it exceeds the threshold then the algorithm samples the empirical best arm. Otherwise, it samples the challenger arm. We show that the proposed algorithm is optimal as $\delta \rightarrow 0$. Our analysis relies on identifying a limiting fluid dynamics of allocations that satisfy a series of ordinary differential equations pasted together and that describe the asymptotic path followed by our algorithm. We rely on the implicit function theorem to show existence and uniqueness of these fluid ode's and to show that the proposed algorithm remains close to the ode solution.
翻译:Top-$2$方法在解决最佳臂识别(BAI)问题中已变得流行。最佳臂(即在有限多个臂中具有最大均值的臂)通过如下算法识别:该算法在任一顺序步骤中,以固定概率$\beta$独立地拉动经验最佳臂,否则拉动最佳挑战臂。错误选择的概率保证低于指定的$\delta >0$。对于BAI问题,样本复杂度的信息论下界是众所周知的,并且当$\delta \rightarrow 0$时,计算要求较高的插件方法能渐近匹配该下界。上述top-2算法对于任意$\beta \in (0,1)$,其样本复杂度在下界的一个常数倍范围内。然而,确定能匹配下界的最优$\beta$值已被证明是困难的。在本文中,我们针对此问题提出了一种最优的top-2类型算法。我们考虑一个以阈值为锚点的分配函数。若该函数超过阈值,则算法对经验最佳臂进行采样;否则,它对挑战臂进行采样。我们证明,当$\delta \rightarrow 0$时,所提出的算法是最优的。我们的分析依赖于识别分配的一种极限流体动力学,该动力学满足一系列拼接在一起的常微分方程,并描述了算法所遵循的渐近路径。我们依靠隐函数定理来证明这些流体常微分方程的存在性和唯一性,并证明所提出的算法保持接近该常微分方程的解。