Although the foundations of ranking are well established, the ranking literature has primarily been focused on simple, unimodal models, e.g. the Mallows and Plackett-Luce models, that define distributions centered around a single total ordering. Explicit mixture models have provided some tools for modelling multimodal ranking data, though learning such models from data is often difficult. In this work, we contribute a contextual repeated selection (CRS) model that leverages recent advances in choice modeling to bring a natural multimodality and richness to the rankings space. We provide rigorous theoretical guarantees for maximum likelihood estimation under the model through structure-dependent tail risk and expected risk bounds. As a by-product, we also furnish the first tight bounds on the expected risk of maximum likelihood estimators for the multinomial logit (MNL) choice model and the Plackett-Luce (PL) ranking model, as well as the first tail risk bound on the PL ranking model. The CRS model significantly outperforms existing methods for modeling real world ranking data in a variety of settings, from racing to rank choice voting.
翻译:尽管排序的基本理论已得到充分确立,但现有文献主要关注简单单峰模型(例如Mallows模型和Plackett-Luce模型),这些模型以单一全序为中心定义分布。显式混合模型为多模态排序数据提供了部分建模工具,但从中学习模型往往困难重重。本文提出一种情境重复选择(CRS)模型,通过利用选择建模领域的最新进展,在排序空间中引入天然的多模态性与丰富性。我们基于结构依赖的尾部风险与期望风险界,为该模型下的最大似然估计提供了严格的理论保证。作为副产品,我们还首次给出了多项式Logit(MNL)选择模型与Plackett-Luce(PL)排序模型最大似然估计量的期望风险紧界,以及PL排序模型的第一个尾部风险界。CRS模型在赛车到排序选择投票等多种现实排序数据建模场景中,显著优于现有方法。