Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.
翻译:在图上的光滑函数在流形学习和半监督学习中有着广泛的应用。本文研究了一类多臂老虎机问题,其中各臂的收益在图上具有光滑性。该框架适用于解决涉及图的在线学习问题,例如基于内容的推荐系统。在该问题中,每个可推荐的项目是无向图中的一个节点,其预期评分与相邻节点的评分相似。目标是推荐具有高预期评分的项目。我们致力于设计这样的算法:其关于最优策略的累积遗憾不会随节点数量而大幅增长。具体而言,我们引入了有效维度这一概念(该维度在实际图结构中小到可以忽略),并提出了三种算法来解决该问题,其复杂度在该维度上呈线性或次线性增长。关于内容推荐问题的实验表明,仅通过数十次节点评估即可学习到用户对数千个项目的良好偏好估计。