A fundamental problem in statistics is measuring the correlation between two rankings of a set of items. Kendall's $τ$ and Spearman's $ρ$ are well established correlation coefficients whose symmetric structure guarantees zero expected value between two rankings randomly chosen with uniform probability. In many modern applications, however, greater importance is assigned to top-ranked items, motivating weighted variants of these coefficients. Such weighting schemes generally break the symmetry of the original formulations, resulting in a non-zero expected value under independence and compromising the interpretation of zero correlation. We propose a general standardization function $g(\cdot)$ that transforms a ranking correlation coefficient $Γ$ into a standardized form $g(Γ)$ with zero expected value under randomness. The transformation preserves the domain $[-1,1]$, satisfies the boundary conditions, is continuous and increasing, and reduces to the identity for coefficients that already satisfy the zero-expected-value property. The construction of $g(x)$ depends on three distributional parameters of $Γ$: its mean, variance, and left variance; since their exact calculation becomes infeasible for large ranking lengths $n$, we develop accurate numerical estimates based on Monte Carlo sampling combined with polynomial regression to capture their dependence on $n$.
翻译:统计学中的一个基本问题是衡量对一组项目进行两种排序之间的相关性。Kendall 的 $τ$ 和 Spearman 的 $ρ$ 是成熟的相关系数,其对称结构保证了在均匀概率下随机选择的两种排序之间期望值为零。然而,在许多现代应用中,排名靠前的项目被赋予更高的重要性,这推动了这些系数的加权变体。此类加权方案通常会打破原始公式的对称性,导致在独立性条件下期望值非零,从而损害了零相关性的解释。我们提出了一种通用的标准化函数 $g(\cdot)$,它将排序相关系数 $Γ$ 转换为具有随机性下零期望值的标准化形式 $g(Γ)$。该变换保持了定义域 $[-1,1]$,满足边界条件,连续且递增,并且对于已经满足零期望值性质的系数简化为恒等变换。$g(x)$ 的构造依赖于 $Γ$ 的三个分布参数:其均值、方差和左方差;由于对于大的排序长度 $n$,它们的精确计算变得不可行,我们基于蒙特卡洛采样结合多项式回归开发了精确的数值估计,以捕捉它们对 $n$ 的依赖性。