The recent literature on online learning to rank (LTR) has established the utility of prior knowledge to Bayesian ranking bandit algorithms. However, a major limitation of existing work is the requirement for the prior used by the algorithm to match the true prior. In this paper, we propose and analyze adaptive algorithms that address this issue and additionally extend these results to the linear and generalized linear models. We also consider scalar relevance feedback on top of click feedback. Moreover, we demonstrate the efficacy of our algorithms using both synthetic and real-world experiments.
翻译:近期在线排序学习(LTR)文献证实了先验知识在贝叶斯排序赌博机算法中的实用性。然而,现有工作存在一个重大局限:要求算法使用的先验必须与真实先验相匹配。本文提出并分析能解决该问题的自适应算法,并进一步将这些结果推广至线性模型与广义线性模型。此外,我们在点击反馈基础上引入了标量相关性反馈。通过合成数据与真实实验,我们验证了所提算法的有效性。