We propose a novel technique for analyzing adaptive sampling called the {\em Simulator}. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and change-of-measure techniques, without succumbing to the limitations of either method. For concreteness, we apply our techniques to a structured multi-arm bandit problem in the fixed-confidence pure exploration setting, where we show that the constraints on the means imply a substantial gap between the moderate-confidence sample complexity, and the asymptotic sample complexity as $\delta \to 0$ found in the literature. We also prove the first instance-based lower bounds for the top-k problem which incorporate the appropriate log-factors. Moreover, our lower bounds zero-in on the number of times each \emph{individual} arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity. Our new analysis inspires a simple and near-optimal algorithm for the best-arm and top-k identification, the first {\em practical} algorithm of its kind for the latter problem which removes extraneous log factors, and outperforms the state-of-the-art in experiments.
翻译:我们提出了一种名为“模拟器”的新型自适应采样分析技术。与现有方法不同,本方法并非关注任何固定采样策略所能收集的信息量,而是基于截至任意时刻已收集的有限数据,衡量区分优劣采样策略的难度。这一视角转变使我们能够融合法诺不等式与测度变换两种技术的优势,同时规避二者的局限性。为具体说明,我们将该技术应用于固定置信度纯探索场景中的结构化多臂赌博机问题,结果表明:均值约束条件会导致中等置信度样本复杂度与文献中发现的当δ→0时的渐近样本复杂度之间存在显著差距。我们还针对top-k问题首次提出了包含适当对数因子的基于实例下界。此外,我们的下界精确聚焦于每个个体臂所需抽取的次数,揭示了在总样本复杂度中被掩盖的新现象。这一新分析启发了一种用于最佳臂与top-k识别的简单近优算法——这是首个消除多余对数因子、且在实验中优于现有技术的实用型top-k识别算法。