Clustered Mallows Model

Rankings are a type of preference elicitation that arise in experiments where assessors arrange items, for example, in decreasing order of utility. Orderings of n items labelled {1,...,n} denoted are permutations that reflect strict preferences. For a number of reasons, strict preferences can be unrealistic assumptions for real data. For example, when items share common traits it may be reasonable to attribute them equal ranks. Also, there can be different importance attributions to decisions that form the ranking. In a situation with, for example, a large number of items, an assessor may wish to rank at top a certain number items; to rank other items at the bottom and to express indifference to all others. In addition, when aggregating opinions, a judging body might be decisive about some parts of the rank but ambiguous for others. In this paper we extend the well-known Mallows (Mallows, 1957) model (MM) to accommodate item indifference, a phenomenon that can be in place for a variety of reasons, such as those above mentioned.The underlying grouping of similar items motivates the proposed Clustered Mallows Model (CMM). The CMM can be interpreted as a Mallows distribution for tied ranks where ties are learned from the data. The CMM provides the flexibility to combine strict and indifferent relations, achieving a simpler and robust representation of rank collections in the form of ordered clusters. Bayesian inference for the CMM is in the class of doubly-intractable problems since the model's normalisation constant is not available in closed form. We overcome this challenge by sampling from the posterior with a version of the exchange algorithm \citep{murray2006}. Real data analysis of food preferences and results of Formula 1 races are presented, illustrating the CMM in practical situations.

翻译：排名是一种偏好诱发形式，出现在评估者对物品进行排序（例如按效用递减顺序排列）的实验中。对标记为{1,...,n}的n个物品进行排序，以反映严格偏好的排列表示。由于多种原因，严格偏好对真实数据而言可能是不切实际的假设。例如，当物品具有共同特征时，赋予它们相同排名可能更为合理。此外，构成排名的决策可能存在不同的重要性归属。在物品数量较多的情况下，评估者可能希望将特定数量的物品排在顶部，将其他物品排在底部，并对所有其他物品表示无差异。另外，在汇总意见时，评审团可能对排名的某些部分具有决定性，但对其他部分则模棱两可。本文扩展了著名的马洛斯（Mallows, 1957）模型（MM），以容纳物品无差异现象——该现象可能由于上述多种原因而存在。相似物品的潜在分组催生了所提出的聚类马洛斯模型（CMM）。CMM可解释为针对并列排名的马洛斯分布，其中并列关系从数据中学习得到。CMM提供了结合严格关系与无差异关系的灵活性，以有序聚类的形式实现排名集合的更简洁且稳健表示。由于模型的归一化常数未以闭合形式给出，CMM的贝叶斯推断属于双倍难解问题类别。我们通过使用交换算法（Murray, 2006）的变体从后验分布中采样来克服这一挑战。本文展示了食物偏好分析和一级方程式比赛结果的真实数据分析，说明了CMM在实际场景中的应用。