We consider the best arm identification problem, where the goal is to identify the arm with the highest mean reward from a set of $K$ arms under a limited sampling budget. This problem models many practical scenarios such as A/B testing. We consider a class of algorithms for this problem, which is provably minimax optimal up to a constant factor. This idea is a generalization of existing works in fixed-budget best arm identification, which are limited to a particular choice of risk measures. Based on the framework, we propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure $H_1$. Unlike existing algorithms, Almost Tracking does not require the total budget in advance nor does it need to discard a significant part of samples, which gives a practical advantage. Through experiments on synthetic and real-world datasets, we show that our algorithm outperforms existing anytime algorithms as well as fixed-budget algorithms.
翻译:我们考虑最佳臂识别问题,其目标是在有限采样预算下,从一组$K$个臂中识别出平均奖励最高的臂。该问题建模了A/B测试等许多实际场景。我们针对该问题提出一类算法,该类算法在常数因子范围内被证明是极小化最优的。这一思想是对现有固定预算最佳臂识别研究的推广,而现有研究局限于特定的风险度量选择。基于该框架,我们提出近乎追踪算法(Almost Tracking),这是一种闭合形式算法,在常用风险度量$H_1$下具有可证明的保证。与现有算法不同,近乎追踪算法既不需要预先知道总预算,也无需丢弃大量样本,因此具有实际优势。通过在合成数据集和真实数据集上的实验,我们证明该算法在性能上优于现有任意时刻算法以及固定预算算法。