We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting. In this multi-armed bandit problem, we have $M$ arms, each providing i.i.d. samples that follow a multivariate Gaussian distribution with an {\em unknown} mean and a known unit covariance. The arms are grouped into $K$ clusters based on the distance between their means using the Single Linkage (SLINK) clustering algorithm on the means of the arms. Since the true means are unknown, the objective is to obtain the above clustering of the arms with the minimum number of samples drawn from the arms, subject to an upper bound on the error probability. We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that this algorithm is order optimal, meaning that the upper bound on its expected sample complexity for given error probability $\delta$ is within a factor of 2 of an instance-dependent lower bound as $\delta \rightarrow 0$. Furthermore, we propose a computationally more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification. Simulation results demonstrate that the performance of LUCBBOC is comparable to that of ATBOC. We numerically assess the effectiveness of the proposed algorithms through numerical experiments on both synthetic datasets and the real-world MovieLens dataset. To the best of our knowledge, this is the first work on bandit online clustering that allows arms with different means in a cluster and $K$ greater than 2.
翻译:本研究在多臂老虎机框架下,针对固定置信度场景中的在线聚类问题展开探讨。在该多臂老虎机问题中,我们设有$M$个臂,每个臂提供独立同分布的样本,这些样本服从均值为未知、协方差为单位矩阵的多元高斯分布。根据臂均值之间的距离,采用单连接(SLINK)聚类算法对臂的均值进行分组,从而将臂划分为$K$个簇。由于真实均值未知,目标是在给定错误概率上界的前提下,通过从臂中抽取最少样本量来获得上述臂的聚类结果。我们提出了一种新颖算法——平均追踪老虎机在线聚类(ATBOC),并证明该算法具有阶次最优性,即对于给定错误概率$\delta$,其期望样本复杂度的上界在$\delta \rightarrow 0$时与实例相关下界相差不超过2倍。此外,受最佳臂识别算法LUCB的启发,我们提出了一种计算效率更高的算法——基于置信区间上下界的老虎机在线聚类(LUCBBOC)。仿真结果表明,LUCBBOC的性能与ATBOC相当。我们通过在合成数据集和真实世界MovieLens数据集上的数值实验,对所提算法的有效性进行了数值评估。据我们所知,这是首个允许簇内臂具有不同均值且$K$大于2的老虎机在线聚类研究工作。