This paper proposes a nonparametric test of pairwise independence of one random variable from a large pool of other random variables. The test statistic is the maximum of several Chatterjee's rank correlations and critical values are computed via a block multiplier bootstrap. We show in simulations that other popular tests based on distance covariances do not necessarily control size under this null. Our test, on the other hand, is shown to asymptotically control size uniformly over a large class of data-generating processes, even when the number of variables is much larger than sample size. The test is consistent against any fixed alternative. It can be combined with a stepwise procedure for selecting those variables from the pool that violate independence, while controlling the family-wise error rate. All formal results leave the dependence among variables in the pool completely unrestricted. In simulations, we find that our test is typically more powerful than competing methods (in settings where they are valid), particularly in high-dimensional scenarios or when there is dependence among variables in the pool.
翻译:本文提出了一种非参数检验方法,用于检验单个随机变量与一个大型随机变量集合中所有变量之间的两两独立性。检验统计量为若干Chatterjee秩相关系数的最大值,其临界值通过块乘子自助法计算。模拟研究表明,基于距离协方差的其它常用检验方法在此零假设下未必能有效控制检验水平。相比之下,本文提出的检验方法被证明能在大量数据生成过程中渐进地一致控制检验水平,即使在变量数量远大于样本量的情况下亦然。该检验对任何固定备择假设均具有相合性。该方法可与逐步选择程序相结合,在控制族错误率的同时,从变量集合中筛选出违反独立性的变量。所有形式化结论均完全未限制变量集合内部的相关性结构。模拟实验表明,本检验方法(在竞争方法有效的设定中)通常具有更高的检验功效,尤其在高维场景或变量集合内部存在相关性时表现更为突出。