Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.
翻译:图上的平滑函数在流形学习和半监督学习中有着广泛应用。本文研究了一个赌博机问题,其中臂的回报在图上是平滑的。该框架适用于解决涉及图的在线学习问题,例如基于内容的推荐。在此问题中,每个可推荐的项目是一个节点,其期望评分与其邻居相似。目标是推荐具有高期望评分的项目。我们致力于设计算法,使得相对于最优策略的累积遗憾不会随着节点数量增加而显著变差。具体而言,我们引入了有效维度的概念,该维度在实际图中较小,并提出了两种算法来解决我们的问题,这些算法的复杂度分别在该维度上呈线性和次线性关系。我们在实际内容推荐问题上的实验表明,仅通过数十个节点的评估即可学习出对数千个项目用户偏好的良好估计。