We investigate the Active Clustering Problem (ACP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e., the least number of observation - and with a probability of error smaller than a prescribed constant $\delta$. In this paper, (i) we derive a non-asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the active setting.
翻译:本文研究了主动聚类问题。学习者与一个具有$d$维次高斯反馈的$N$臂随机老虎机进行交互。存在一个将臂划分为$K$个组的隐藏划分,使得同一组内的臂共享相同的均值向量。学习者的任务是以最小的预算(即最少的观测次数)并以小于预定常数$\delta$的错误概率揭示这一隐藏划分。在本文中,(i)我们推导了预算的非渐近下界,(ii)我们提出了计算高效的ACB算法,该算法的预算在大多数情况下匹配该下界。我们改进了均匀采样策略的性能。重要的是,与批量设置不同,我们证明了在主动设置中不存在计算-信息差距。