Chatterjee's rank correlation coefficient $\xi_n$ is an empirical index for detecting functional dependencies between two variables $X$ and $Y$. It is an estimator for a theoretical quantity $\xi$ that is zero for independence and one if $Y$ is a measurable function of $X$. Based on an equivalent characterization of sorted numbers, we derive an upper bound for $\xi_n$ and suggest a simple normalization aimed at reducing its bias for small sample size $n$. In Monte Carlo simulations of various cases, the normalization reduced the bias in all cases. The mean squared error was reduced, too, for values of $\xi$ greater than about 0.4. Moreover, we observed that non-parametric confidence intervals for $\xi$ based on bootstrapping $\xi_n$ in the usual n-out-of-n way have a coverage probability close to zero. This is remedied by an m-out-of-n bootstrap without replacement in combination with our normalization method.
翻译:Chatterjee秩相关系数$\xi_n$是用于检测两个变量$X$和$Y$之间函数依赖关系的经验指标。该指标是理论量$\xi$的估计量:当变量相互独立时$\xi$为零,当$Y$是$X$的可测函数时$\xi$为1。基于排序数的等价表征,我们推导出$\xi_n$的上界,并提出一种旨在减小小样本量$n$下偏差的简单归一化方法。在多种情形的蒙特卡洛模拟中,该归一化方法均有效减小了偏差。当$\xi$值大于约0.4时,均方误差同样得到降低。此外,我们观察到基于常规n-out-of-n自助法对$\xi_n$构建的非参数置信区间,其覆盖概率接近零。通过结合我们的归一化方法采用无放回m-out-of-n自助法,这一问题得到有效解决。