In many contexts involving ranked preferences, agents submit partial orders over available alternatives. Statistical models often treat these as marginal in the space of total orders, but this approach overlooks information contained in the list length itself. In this work, we introduce and taxonomize approaches for jointly modeling distributions over top-$k$ partial orders and list lengths $k$, considering two classes of approaches: composite models that view a partial order as a truncation of a total order, and augmented ranking models that model the construction of the list as a sequence of choice decisions, including the decision to stop. For composite models, we consider three dependency structures for joint modeling of order and truncation length. For augmented ranking models, we consider different assumptions on how the stop-token choice is modeled. Using data consisting of partial rankings from San Francisco school choice and San Francisco ranked choice elections, we evaluate how well the models predict observed data and generate realistic synthetic datasets. We find that composite models, explicitly modeling length as a categorical variable, produce synthetic datasets with accurate length distributions, and an augmented model with position-dependent item utilities jointly models length and preferences in the training data best, as measured by negative log loss. Methods from this work have significant implications on the simulation and evaluation of real-world social systems that solicit ranked preferences.
翻译:在许多涉及排序偏好的场景中,参与者会提交关于可用选项的偏序关系。统计模型通常将这些偏序视为全序空间中的边际分布,但这种方法忽略了列表长度本身所包含的信息。在本研究中,我们引入并系统分类了联合建模top-$k$偏序分布与列表长度$k$的方法,考虑了两类建模途径:将偏序视为全序截断的复合模型,以及将列表构建建模为一系列选择决策(包括停止决策)的增强排序模型。对于复合模型,我们考虑了三种用于联合建模排序与截断长度的依赖结构。对于增强排序模型,我们考虑了关于停止标记选择建模方式的不同假设。利用来自旧金山学校选择和旧金山排序选择选举的偏序排名数据,我们评估了这些模型在预测观测数据和生成真实合成数据集方面的表现。我们发现,显式将长度建模为类别变量的复合模型能够生成具有准确长度分布的合成数据集;而一个具有位置依赖项效用的增强模型,在负对数损失度量下,能够最好地联合建模训练数据中的长度与偏好。本研究中的方法对于征集排序偏好的现实社会系统的模拟与评估具有重要影响。