We consider the best arm identification problem, where the goal is to identify the arm with the highest mean reward from a set of $K$ arms under a limited sampling budget. This problem models many practical scenarios such as A/B testing. We consider a class of algorithms for this problem, which is provably minimax optimal up to a constant factor. This idea is a generalization of existing works in fixed-budget best arm identification, which are limited to a particular choice of risk measures. Based on the framework, we propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure $H_1$. Unlike existing algorithms, Almost Tracking does not require the total budget in advance nor does it need to discard a significant part of samples, which gives a practical advantage. Through experiments on synthetic and real-world datasets, we show that our algorithm outperforms existing anytime algorithms as well as fixed-budget algorithms.
翻译:我们考虑最佳臂识别问题,其目标是在有限采样预算下,从$K$个臂中识别出平均奖励最高的臂。该问题建模了A/B测试等许多实际场景。我们针对该问题提出一类算法,该类算法在常数因子内被证明是极小化最优的。这一思想是对现有固定预算最佳臂识别工作的推广,而现有工作仅限于特定的风险测度选择。基于该框架,我们提出了“几乎追踪”(Almost Tracking)算法,这是一种闭式算法,能在流行的风险测度$H_1$下提供可证明的保证。与现有算法不同,“几乎追踪”既不需要预先知道总预算,也无需丢弃大量样本,这带来了实际优势。通过在合成数据集和真实数据集上的实验,我们证明了我们的算法优于现有的任意时刻算法以及固定预算算法。