Standardization of Weighted Ranking Correlation Coefficients

A fundamental problem in statistics is measuring the correlation between two rankings of a set of items. Kendall's $τ$ and Spearman's $ρ$ are well established correlation coefficients whose symmetric structure guarantees zero expected value between two rankings randomly chosen with uniform probability. In many modern applications, however, greater importance is assigned to top-ranked items, motivating weighted variants of these coefficients. Such weighting schemes generally break the symmetry of the original formulations, resulting in a non-zero expected value under independence and compromising the interpretation of zero correlation. We propose a general standardization function $g(\cdot)$ that transforms a ranking correlation coefficient $Γ$ into a standardized form $g(Γ)$ with zero expected value under randomness. The transformation preserves the domain $[-1,1]$, satisfies the boundary conditions, is continuous and increasing, and reduces to the identity for coefficients that already satisfy the zero-expected-value property. The construction of $g(x)$ depends on three distributional parameters of $Γ$: its mean, variance, and left variance; since their exact calculation becomes infeasible for large ranking lengths $n$, we develop accurate numerical estimates based on Monte Carlo sampling combined with polynomial regression to capture their dependence on $n$.

翻译：统计学中的一个基本问题是衡量对一组项目进行两种排序之间的相关性。Kendall 的 $τ$ 和 Spearman 的 $ρ$ 是成熟的相关系数，其对称结构保证了在均匀概率下随机选择的两种排序之间期望值为零。然而，在许多现代应用中，排名靠前的项目被赋予更高的重要性，这推动了这些系数的加权变体。此类加权方案通常会打破原始公式的对称性，导致在独立性条件下期望值非零，从而损害了零相关性的解释。我们提出了一种通用的标准化函数 $g(\cdot)$，它将排序相关系数 $Γ$ 转换为具有随机性下零期望值的标准化形式 $g(Γ)$。该变换保持了定义域 $[-1,1]$，满足边界条件，连续且递增，并且对于已经满足零期望值性质的系数简化为恒等变换。$g(x)$ 的构造依赖于 $Γ$ 的三个分布参数：其均值、方差和左方差；由于对于大的排序长度 $n$，它们的精确计算变得不可行，我们基于蒙特卡洛采样结合多项式回归开发了精确的数值估计，以捕捉它们对 $n$ 的依赖性。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【WSDM2021-教程】超越概率排序原则：建模文档依赖性，附PPT

专知会员服务

14+阅读 · 2021年3月15日

【WWW2021】动态排序学习最大化边际公平性

专知会员服务

14+阅读 · 2021年3月13日

【哈佛大学干货书】概率导论，589页pdf，Introduction to Probability

专知会员服务

141+阅读 · 2021年1月24日