Exact Detection Thresholds and Minimax Optimality of Chatterjee's Correlation Coefficient

Recently, Chatterjee (2021) introduced a new rank-based correlation coefficient which can be used to measure the strength of dependence between two random variables. This coefficient has already attracted much attention as it converges to the Dette-Siburg-Stoimenov measure (see Dette et al. (2013)), which equals $0$ if and only if the variables are independent and $1$ if and only if one variable is a function of the other. Further, Chatterjee's coefficient is computable in (near) linear time, which makes it appropriate for large-scale applications. In this paper, we expand the theoretical understanding of Chatterjee's coefficient in two directions: (a) First we consider the problem of testing for independence using Chatterjee's correlation. We obtain its asymptotic distribution under any changing sequence of alternatives converging to the null hypothesis (of independence). We further obtain a general result that gives exact detection thresholds and limiting power for Chatterjee's test of independence under natural nonparametric alternatives converging to the null. As applications of this general result, we prove a $n^{-1/4}$ detection boundary for this test and compute explicitly the limiting local power on the detection boundary for popularly studied alternatives in the literature. (b) We then construct a test for non-trivial levels of dependence using Chatterjee's coefficient. In contrast to testing for independence, we prove that, in this case, Chatterjee's coefficient indeed yields a minimax optimal procedure with a $n^{-1/2}$ detection boundary. Our proof techniques rely on Stein's method of exchangeable pairs, a non-asymptotic projection result, and information theoretic lower bounds.

翻译：近期，查特杰（2021）提出了一种新的基于秩的相关系数，可用于度量两个随机变量之间的依赖强度。该系数因收敛于德特-西堡-斯托伊梅诺夫测度（见德特等（2013））而备受关注——该测度在变量独立时取值为$0$，在一变量是另一变量的函数时取值为$1$。此外，查特杰系数可在（近）线性时间内计算，因而适用于大规模应用场景。本文从两个方向拓展了对查特杰系数的理论认知：（a）首先，我们考虑基于查特杰相关系数的独立性检验问题。在备择假设序列收敛于（独立性）原假设的任意变化情形下，我们推导了该系数的渐近分布。进一步，针对收敛于原假设的自然非参数备择假设，我们得到了一类通用结果，该结果揭示了查特杰独立性检验的精确检测阈值与极限势函数。作为该通用结果的应用，我们证明该检验的检测边界为$n^{-1/4}$，并显式计算了文献中常见备择假设在检测边界上的局部极限势函数。（b）随后，我们利用查特杰系数构建了非平凡依赖水平的检验方法。与独立性检验形成对比的是，我们证明在该情形下，查特杰系数能提供具有$n^{-1/2}$检测边界的极小极大最优程序。我们的证明技术依赖于斯坦因可交换对方法、非渐近投影结果以及信息论下界。