Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $\xi^{(h,F)}_n$, characterized by a continuous bivariate function $h$ and a cdf function $F$. By offering a range of selections for $h$ and $F$, $\xi^{(h,F)}_n$ encompasses a diverse class of novel correlation coefficients, while also incorporates the Chatterjee's correlation coefficient (Chatterjee, 2021) as a special case. We prove that $\xi^{(h,F)}_n$ converges almost surely to a deterministic limit $\xi^{(h,F)}$ as sample size $n$ approaches infinity. In addition, under appropriate conditions imposed on $h$ and $F$, the limit $\xi^{(h,F)}$ satisfies the three appealing properties: (P1). it belongs to the range of $[0,1]$; (P2). it equals 1 if and only if $Y$ is a measurable function of $X$; and (P3). it equals 0 if and only if $Y$ is independent of $X$. As amplified by our numerical experiments, our proposals provide practitioners with a variety of options to choose the most suitable correlation coefficient tailored to their specific practical needs.
翻译:量化随机标量$X$与$Y$之间函数依赖关系的强度是统计学中的重要问题。尽管许多现有相关系数在识别线性或单调函数依赖方面表现优异,但在捕捉一般非单调函数关系时存在不足。为此,我们提出一族由连续二元函数$h$和累积分布函数$F$表征的相关系数$\xi^{(h,F)}_n$。通过提供$h$与$F$的多种选择,$\xi^{(h,F)}_n$涵盖了一类多样化的新型相关系数,同时将查特杰相关系数(Chatterjee, 2021)作为特例纳入其中。我们证明了当样本量$n$趋于无穷时,$\xi^{(h,F)}_n$几乎必然收敛至确定性极限$\xi^{(h,F)}$。此外,在$h$与$F$满足适当条件的约束下,极限$\xi^{(h,F)}$满足以下三个理想性质:(P1)取值于$[0,1]$区间;(P2)当且仅当$Y$是$X$的可测函数时等于1;(P3)当且仅当$Y$与$X$独立时等于0。数值实验进一步表明,我们的方法为实践者提供了多样化的选择,使其能够根据具体应用需求选取最合适的相关系数。