Nonparametric Kernel Clustering with Bandit Feedback

Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The problem is formally posed as the task of partitioning the arms of an N-armed stochastic bandit according to their underlying distributions, grouping two arms together if and only if they share the same distribution, using samples collected sequentially and adaptively. This setting has gained attention in recent years due to its applicability in recommendation systems and crowdsourcing. Existing works on clustering with bandit feedback rely on a strong assumption that the underlying distributions are sub-Gaussian. As a consequence, the existing methods mainly cover settings with linearly-separable clusters, which has little practical relevance. We introduce a framework of ``nonparametric clustering with bandit feedback'', where the underlying arm distributions are not constrained to any parametric, and hence, it is applicable for active clustering of real-world datasets. We adopt a kernel-based approach, which allows us to reformulate the nonparametric problem as the task of clustering the arms according to their kernel mean embeddings in a reproducing kernel Hilbert space (RKHS). Building on this formulation, we introduce the KABC algorithm with theoretical correctness guarantees and analyze its sampling budget. We introduce a notion of signal-to-noise ratio for this problem that depends on the maximum mean discrepancy (MMD) between the arm distributions and on their variance in the RKHS. Our algorithm is adaptive to this unknown quantity: it does not require it as an input yet achieves instance-dependent guarantees.

翻译：基于老虎机反馈的聚类问题旨在对一组项目进行划分，其中聚类算法可以顺序查询项目以获取带噪声的观测结果。该问题形式化地表述为：根据N臂随机老虎机各臂的底层分布对其进行划分，当且仅当两个臂具有相同分布时将其归为一组，且采样过程需依序自适应地进行。近年来，由于在推荐系统和众包任务中的适用性，该设定受到广泛关注。现有关于老虎机反馈聚类的研究均依赖于底层分布服从亚高斯性的强假设，导致现有方法主要适用于线性可分簇的场景，而这类场景在实际应用中相关性较弱。本文提出“基于老虎机反馈的非参数聚类”框架，其中底层臂分布不受任何参数模型约束，因而适用于现实世界数据集的主动聚类。我们采用基于核函数的方法，将非参数问题转化为在再生核希尔伯特空间（RKHS）中根据核均值嵌入对臂进行聚类的任务。基于此形式化表述，我们提出具有理论正确性保证的KABC算法，并分析其采样预算。针对该问题，我们提出了一种信噪比概念，其取决于臂分布间的最大均值差异（MMD）及其在RKHS中的方差。我们的算法能自适应于该未知量：无需将其作为输入即可实现实例相关的性能保证。