Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning

In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of \textit{local ranking median}, we introduce the concept of \textit{consensus ranking distribution} ($\crd$), a sparse mixture model of Dirac masses on $\mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the popular Kendall $τ$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities, paving the way for the development of efficient learning methods that do not suffer from the lack of vector space structure on $\mathfrak{S}_n$. In particular, we propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data, from the Dirac mass at a Kemeny median at the root of the tree to the empirical ranking data distribution itself at the end of the tree's exhaustive growth. In addition to the theoretical arguments developed, the relevance of the algorithm is empirically supported by various numerical experiments.

翻译：本文提出了一种在对称群$\mathfrak{S}_n$上概率分布的排序分布汇总新方法，超越了经典的共识理论与Kemeny中位数框架。基于局部排序中位数概念，我们引入共识排序分布概念，即$\mathfrak{S}_n$上狄拉克质量的稀疏混合模型，旨在从质量传输视角以较小失真逼近排序分布。我们证明当选择流行的Kendall $τ$距离作为成本函数时，最优失真可表示为成对概率的函数，这为开发不受$\mathfrak{S}_n$缺乏向量空间结构影响的高效学习方法开辟了道路。特别地，我们提出一种自上而下的树形统计算法，该算法支持基于排序数据逐步细化CRD——从树根处Kemeny中位数的狄拉克质量开始，到树完全生长末端处的经验排序数据分布本身结束。除理论论证外，该算法的实际意义通过多项数值实验得到了实证支持。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日

【经典书】从数据中学习，第二版，LEARNING FROM DATA Concepts, Theory, and Methods

专知会员服务

49+阅读 · 2021年9月6日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

【MIT经典书】统计学习与序列预测，261页pdf

专知会员服务

78+阅读 · 2020年11月17日