In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $δ$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $δ\to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $ε$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.
翻译:在纯探索问题中,统计学家通过序贯收集信息来回答关于某个随机未知环境的问题。返回错误答案的概率不应超过最大风险参数$δ$,而优秀算法应尽可能减少对环境的查询次数。追踪停止算法是解决此类问题的开创性方法。具体而言,众所周知,当从环境到其正确答案的映射为单值时(例如,具有唯一最优臂的最佳臂识别问题),该算法在$δ\to 0$时具有渐近最优的样本复杂度保证。粘性追踪停止算法将这些结果扩展到每个环境可能存在多个正确答案的场景(例如,$ε$最优臂识别问题)。尽管两种方法在渐近状态下均具有最优性,但其非渐近性能保证仍属未知。本研究填补了这一空白,为两种算法提供了非渐近性能保证。