Chatterjee's rank correlation coefficient $\xi_n$ is an empirical index for detecting functional dependencies between two variables $X$ and $Y$. It is an estimator for a theoretical quantity $\xi$ that is zero for independence and one if $Y$ is a measurable function of $X$. Based on an equivalent characterization of sorted numbers, we derive an upper bound for $\xi_n$ and suggest a simple normalization aimed at reducing its bias for small sample size $n$. In Monte Carlo simulations of various models, the normalization reduced the bias in all cases. The mean squared error was reduced, too, for values of $\xi$ greater than about 0.4. Moreover, we observed that non-parametric confidence intervals for $\xi$ based on bootstrapping $\xi_n$ in the usual n-out-of-n way have a coverage probability close to zero. This is remedied by an m-out-of-n bootstrap without replacement in combination with our normalization method.
翻译:Chatterjee秩相关系数$\xi_n$是一种用于检测两个变量$X$与$Y$之间函数依赖关系的经验指标。该系数是理论量$\xi$的估计量,当$X$与$Y$独立时$\xi$为零,而当$Y$是$X$的可测函数时$\xi$为一。基于排序数值的等价表征,我们推导出$\xi_n$的上界,并提出一种旨在降低小样本量$n$下偏差的简易归一化方法。在各种模型的蒙特卡洛模拟中,该归一化方法在所有情况下均降低了偏差。当$\xi$大于约0.4时,均方误差亦有所降低。此外,我们观察到基于常规n-out-of-n方式对$\xi_n$进行自助法重抽样的非参数置信区间,其覆盖概率接近于零。通过采用无放回的m-out-of-n自助法并结合我们的归一化方法,该问题得以解决。