Learning the optimal ordering of content is an important challenge in website design. The learning to rank (LTR) framework models this problem as a sequential problem of selecting lists of content and observing where users decide to click. Most previous work on LTR assumes that the user considers each item in the list in isolation, and makes binary choices to click or not on each. We introduce a multinomial logit (MNL) choice model to the LTR framework, which captures the behaviour of users who consider the ordered list of items as a whole and make a single choice among all the items and a no-click option. Under the MNL model, the user favours items which are either inherently more attractive, or placed in a preferable position within the list. We propose upper confidence bound (UCB) algorithms to minimise regret in two settings - where the position dependent parameters are known, and unknown. We present theoretical analysis leading to an $\Omega(\sqrt{JT})$ lower bound for the problem, an $\tilde{O}(\sqrt{JT})$ upper bound on regret of the UCB algorithm in the known-parameter setting, and an $\tilde{O}(K^2\sqrt{JT})$ upper bound on regret, the first, in the more challenging unknown-position-parameter setting. Our analyses are based on tight new concentration results for Geometric random variables, and novel functional inequalities for maximum likelihood estimators computed on discrete data.
翻译:排序学习最优内容顺序是网站设计中的重要挑战。排序学习(LTR)框架将此问题建模为序列化问题:选择内容列表并观察用户点击位置。以往LTR研究大多假设用户独立评估列表中每个项目,并对每个项目做出是否点击的二元选择。本文在LTR框架中引入多项式Logit(MNL)选择模型,该模型能够刻画将列表视为整体、在所有项目与不点击选项间做出单一选择的用户行为。在MNL模型下,用户更倾向于内在吸引力更强或列表位置更优的项目。我们提出两种场景下的置信上界(UCB)算法以最小化遗憾:位置参数已知与位置参数未知。理论分析给出了该问题的$\Omega(\sqrt{JT})$下界,在参数已知场景下得出UCB算法遗憾的$\tilde{O}(\sqrt{JT})$上界,在更具挑战性的位置参数未知场景下首次得到$\tilde{O}(K^2\sqrt{JT})$遗憾上界。我们的分析基于几何随机变量的紧凑新浓度不等式,以及针对离散数据最大似然估计量的新型函数不等式。