Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.
翻译:汤普森抽样因其在实际应用中(尤其是计算广告领域)展现出的优秀性能而备受关注。尽管该方法效果显著,但其性能分析工具直到近期才得以发展。本文针对一类赌博机问题描述并分析了SpectralTS算法,其中各选项的收益在给定图结构下具有平滑性。在此设定中,每个选项对应图中的一个节点,且相邻节点的期望收益被假设为相似。虽然该设定在推荐系统及广告领域均有应用前景,但传统算法会随选项数量增加而出现性能扩展性差的问题。为此,我们引入有效维度d,该维度在真实世界图中数值较小。理论分析表明,SpectralTS算法的高概率遗憾界为d*sqrt(T ln N),其中T为时间范围,N为选项数量。由于该遗憾界与已知结果相当,且SpectralTS在计算效率上更具优势,因此成为更具实用性的替代方案。我们在合成数据及真实世界数据上的实验均验证了该算法的竞争力。