Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a $\delta$-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified $\delta$) that has minimum sample complexity. Lower bounds for $\delta$-correct algorithms are well known. $\delta$-correct algorithms that match the lower bound asymptotically as $\delta$ reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise, under a $\delta$-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a $\delta$-correct algorithm that matches the lower bound as $\delta$ reduces to zero under the mild restriction that a known bound on the expectation of $(1+\epsilon)^{th}$ moment of the underlying random variables exists, for $\epsilon > 0$. We also propose batch processing and identify near-optimal batch sizes to speed up the proposed algorithm substantially. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well-studied classic problem in the simulation community.
翻译:给定一组可被采样的未知分布(臂),我们考虑利用一个具有最小样本复杂度的 $\delta$-正确算法(一种自适应、顺序算法,将错误概率限制在指定 $\delta$ 内)识别出均值最大的分布的问题。$\delta$-正确算法的下界是众所周知的。当臂分布被限制在单参数指数族时,先前已开发出当 $\delta$ 趋近于零时渐近匹配下界的 $\delta$-正确算法。本文首先观察到一个负面结果:某些限制是必要的;否则在 $\delta$-正确算法下,具有无界支撑的分布在期望上需要无限样本。随后,我们提出一种 $\delta$-正确算法,在基础随机变量的 $(1+\epsilon)^{th}$ 阶矩期望存在已知上界这一温和限制下(其中 $\epsilon > 0$),该算法可在 $\delta$ 趋近于零时匹配下界。我们还提出批处理方案并确定近乎最优的批大小,以显著加速所提算法。最佳臂问题具有众多学习应用,包括推荐系统和产品选择,同时它也是仿真社区中一个研究充分的经典问题。