We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.
翻译:我们研究了上下文多臂赌博机(MAB)中用户聚类的识别问题。上下文MAB是许多实际应用的有效工具,例如内容推荐和在线广告。在实际应用中,用户依赖性对用户行为及相应的奖励起着关键作用。对相似用户进行聚类可以提高奖励估计的质量,从而带来更有效的内容推荐和定向广告。与传统聚类设置不同,我们基于未知的赌博机参数对用户进行聚类,这些参数将逐步被估计。具体而言,我们定义了上下文MAB中的聚类检测问题,并提出了一种嵌入局部聚类过程的赌博机算法LOCB。此外,我们从聚类的正确性、效率及其遗憾界方面对LOCB进行了理论分析。最后,我们从多个角度评估了所提出的算法,其性能优于当前最先进的基线方法。