Chatterjee's rank correlation coefficient $\xi_n$ is an empirical index for detecting functional dependencies between two variables $X$ and $Y$. It is an estimator for a theoretical quantity $\xi$ that is zero for independence and one if $Y$ is a measurable function of $X$. Based on an equivalent characterization of sorted numbers, we derive an upper bound for $\xi_n$ and suggest a simple normalization aimed at reducing its bias for small sample size $n$. In Monte Carlo simulations of various cases, the normalization reduced the bias in all cases. The mean squared error was reduced, too, for values of $\xi$ greater than about 0.4. Moreover, we observed that confidence intervals for $\xi$ based on bootstrapping $\xi_n$ in the usual n-out-of-n way have a coverage probability close to zero. This is remedied by an m-out-of-n bootstrap without replacement in combination with our normalization method.
翻译:查特吉秩相关系数$\xi_n$是一种用于检测两个变量$X$和$Y$之间函数依赖关系的经验指标。它是对理论量$\xi$的估计量:当变量独立时$\xi$为零,若$Y$是$X$的可测函数则$\xi$为1。基于排序数值的等价特征,我们推导出$\xi_n$的上界并提出一种旨在减少小样本量$n$下偏差的简单归一化方法。在多种情况的蒙特卡洛模拟中,该归一化方法在所有情况下均降低了偏差。当$\xi$值大于约0.4时,均方误差也有所降低。此外,我们观察到基于通常的n中取n自助法对$\xi_n$进行重抽样构建的$\xi$置信区间,其覆盖率接近于零。采用m中取n无放回自助法结合我们的归一化方法可有效解决此问题。