Testing the independence between random vectors is a fundamental problem in statistics. Distance correlation, a recently popular dependence measure, is universally consistent for testing independence against all distributions with finite moments. However, when data are subject to selection bias or collected from multiple sources or schemes, spurious dependence may arise. This creates a need for methods that can effectively utilize data from different sources and correct these biases. In this paper, we study the estimation of distance covariance and distance correlation under multiple biased sampling models, which provide a natural framework for addressing these issues. Theoretical properties, including the strong consistency and asymptotic null distributions of the distance covariance and correlation estimators, and the rate at which the test statistic diverges under sequences of alternatives approaching the null, are established. A weighted permutation procedure is proposed to determine the critical value of the independence test. Simulation studies demonstrate that our approach improves both the estimation of distance correlation and the power of the test.
翻译:检验随机向量之间的独立性是统计学中的一个基本问题。距离相关性作为一种近期流行的依赖度量,对于检验具有有限矩的所有分布之间的独立性具有普适一致性。然而,当数据受到选择偏差影响或从多个来源或方案收集时,可能会产生虚假依赖关系。这就需要能够有效利用来自不同来源的数据并校正这些偏差的方法。本文研究了多偏差抽样模型下距离协方差与距离相关性的估计问题,该模型为解决这些问题提供了一个自然的框架。我们建立了距离协方差与相关性估计量的理论性质,包括其强一致性及渐近零分布,以及检验统计量在趋近于零假设的备择假设序列下的发散速率。提出了一种加权置换程序来确定独立性检验的临界值。模拟研究表明,我们的方法同时改进了距离相关性的估计和检验的功效。