We study the Bandit Clustering (BC) problem under the fixed confidence setting, where the objective is to group a collection of data sequences (arms) into clusters through sequential sampling from adaptively selected arms at each time step while ensuring a fixed error probability at the stopping time. We consider a setting where arms in a cluster may have different distributions. Unlike existing results in this setting, which assume Gaussian-distributed arms, we study a broader class of vector-parametric distributions that satisfy mild regularity conditions. Existing asymptotically optimal BC algorithms require solving an optimization problem as part of their sampling rule at each step, which is computationally costly. We propose an Efficient Bandit Clustering algorithm (EBC), which, instead of solving the full optimization problem, takes a single step toward the optimal value at each time step, making it computationally efficient while remaining asymptotically optimal. We also propose a heuristic variant of EBC, called EBC-H, which further simplifies the sampling rule, with arm selection based on quantities computed as part of the stopping rule. We highlight the computational efficiency of EBC and EBC-H by comparing their per-sample run time with that of existing algorithms. The asymptotic optimality of EBC is supported through simulations on the synthetic datasets. Through simulations on both synthetic and real-world datasets, we show the performance gain of EBC and EBC-H over existing approaches.
翻译:本文研究了固定置信度设置下的赌博机聚类问题,其目标是通过在每一步自适应选择臂进行序贯采样,将数据序列(臂)集合划分为若干聚类,同时确保在停止时刻的误差概率不超过固定阈值。我们考虑同一聚类中的臂可能具有不同分布的情形。与现有研究中假设臂服从高斯分布不同,我们研究了一类更广泛的向量参数分布,这些分布满足温和的正则性条件。现有渐近最优的BC算法需要在每一步采样规则中求解一个优化问题,计算代价高昂。我们提出了一种高效赌博机聚类算法,该算法不求解完整的优化问题,而是在每一步向最优值方向执行单步更新,从而在保持渐近最优性的同时显著提升计算效率。我们还提出了EBC的启发式变体EBC-H,该算法进一步简化了采样规则,其臂选择基于停止规则中计算得到的统计量。通过对比EBC和EBC-H与现有算法的单样本运行时间,我们凸显了其计算效率优势。在合成数据集上的仿真实验验证了EBC的渐近最优性。通过对合成数据集和真实数据集的仿真实验,我们证明了EBC和EBC-H相较于现有方法的性能提升。