We address the problem of multi-group mean estimation, which seeks to allocate a finite sampling budget across multiple groups to obtain uniformly accurate estimates of their means. Unlike classical multi-armed bandits, whose objective is to minimize regret by identifying and exploiting the best arm, the optimal allocation in this setting requires sampling every group on the order of $\Theta(T)$ times. This fundamental distinction makes exploration-free algorithms both natural and effective. Our work makes three contributions. First, we strengthen the existing results on subgaussian variance concentration using the Hanson-Wright inequality and identify a class of strictly subgaussian distributions that yield sharper guarantees. Second, we design exploration-free non-adaptive and adaptive algorithms, and we establish tighter regret bounds than the existing results. Third, we extend the framework to contextual bandit settings, an underexplored direction, and propose algorithms that leverage side information with provable guarantees. Overall, these results position exploration-free allocation as a principled and efficient approach to multi-group mean estimation, with potential applications in experimental design, personalization, and other domains requiring accurate multi-group inference.
翻译:我们研究了多群体均值估计问题,该问题旨在将有限的采样预算分配到多个群体中,以获得对其均值的一致准确估计。与经典多臂老虎机(其目标是通过识别并利用最佳臂来最小化遗憾)不同,在此设置下的最优分配要求以 $\Theta(T)$ 阶次采样每个群体。这一根本区别使得免探索算法既自然又有效。我们的工作做出了三点贡献。首先,我们利用 Hanson-Wright 不等式加强了关于次高斯方差集中的现有结果,并识别出一类严格的次高斯分布,该分布能产生更严格的保证。其次,我们设计了免探索的非自适应和自适应算法,并建立了比现有结果更严格的遗憾界。第三,我们将该框架扩展到上下文老虎机设置(一个尚未充分探索的方向),并提出了利用侧信息且具有可证明保证的算法。总体而言,这些结果将免探索分配定位为多群体均值估计的一种原则性且高效的方法,在实验设计、个性化以及其他需要准确多群体推断的领域中具有潜在应用。