We study the problem of online clustering of data sequences in the multi-armed bandit (MAB) framework under the fixed-confidence setting. There are $M$ arms, each providing i.i.d. samples from a parametric distribution whose parameters are unknown. The $M$ arms form $K$ clusters based on the distance between the true parameters. In the MAB setting, one arm can be sampled at each time. The objective is to estimate the clusters of the arms using as few samples as possible from the arms, subject to an upper bound on the error probability. Our setting allows for: arms within a cluster to have non-identical distributions, vector parameter arms, vector observations, and $K \le M$ clusters. We propose and analyze the Average Tracking Bandit Online Clustering (ATBOC) algorithm. ATBOC is asymptotically order-optimal for multivariate Gaussian arms, with expected sample complexity grows at most twice as fast as the lower bound as $δ\rightarrow 0$, and this guarantee extends to multivariate sub-Gaussian arms. For single-parameter exponential family arms, ATBOC is asymptotically optimal, matching the lower bound. We also propose a computationally more efficient alternatives Lower and Upper Confidence Bound based Bandit Online Clustering Algorithm (LUCBBOC), and Bandit Online Clustering-Elimination (BOC-ELIM). We derive the computational complexity of the proposed algorithms and compare their per-sample runtime through simulations. LUCBBOC and BOC-ELIM require lower per-sample runtime than ATBOC while achieving comparable performance. All the proposed algorithms are $δ$-Probably correct, i.e., the error probability of cluster estimate at the stopping time is atmost $δ$. We validate the asymptotic optimality guarantees through simulations, and present the comparison of our proposed algorithms with other related work through simulations on both synthetic and real-world datasets.
翻译:我们研究在固定置信度设置下,多臂赌博机(MAB)框架中数据序列在线聚类的问题。设有 $M$ 个臂,每个臂提供来自参数分布(参数未知)的独立同分布样本。基于真实参数间的距离,这些 $M$ 个臂形成 $K$ 个聚类。在 MAB 设置中,每个时刻可采样一个臂。目标是在错误概率有上界的约束下,尽可能少地采样臂来估计其聚类。本文所设条件允许:同一聚类内臂的分布可非相同、参数可为向量、观测可为向量、且聚类数 $K \le M$。我们提出并分析平均跟踪赌臂在线聚类(ATBOC)算法。对于多元高斯臂,ATBOC 是渐近阶最优的,其期望样本复杂度随 $\delta \rightarrow 0$ 增长速度至多是下界的两倍,且该保证可推广至多元次高斯臂。对于单参数指数族臂,ATBOC 达到渐近最优,与下界匹配。我们还提出计算效率更高的替代方案:基于置信上下界的赌臂在线聚类算法(LUCBBOC)和赌臂在线聚类-淘汰算法(BOC-ELIM)。我们推导了所提算法的计算复杂度,并通过模拟比较了它们的每样本运行时间。LUCBBOC 和 BOC-ELIM 在实现与 ATBOC 相当性能的同时,所需每样本运行时间更低。所有提出的算法均为 $\delta$ - 概率正确,即停止时刻聚类估计的错误概率至多为 $\delta$。我们通过模拟验证了渐近最优性保证,并在合成和真实数据集上通过模拟将所提算法与其他相关工作进行了比较。