The conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z. The CRT assumes that the conditional distribution of X given Z is known under the null hypothesis and then it is compared to the distribution of the observed samples of the original data. The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z. Specifically, we utilize the computationally efficient 1-nearest-neighbor to approximate the conditional distribution that encodes the null hypothesis. Then, theoretically, we show that the distribution of the generated samples is very close to the true conditional distribution in terms of total variation distance. Furthermore, we take the classifier-based conditional mutual information estimator as our test statistic. The test statistic as an empirical fundamental information theoretic quantity is able to well capture the conditional-dependence feature. We show that our proposed test is computationally very fast, while controlling type I and II errors quite well. Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses.
翻译:条件随机化检验(CRT)是近期提出的用于检验两个随机变量X和Y在给定随机变量Z的条件下是否条件独立的方法。CRT假设在原假设下X给定Z的条件分布已知,并将其与原始数据观测样本的分布进行比较。本文旨在通过使用最近邻采样,在不假设X给定Z的条件分布确切形式的前提下,发展一种新颖的CRT替代方法。具体而言,我们采用计算高效的1-最近邻方法近似编码原假设的条件分布。理论上,我们证明生成样本的分布与真实条件分布在总变差距离上非常接近。此外,我们采用基于分类器的条件互信息估计量作为检验统计量。该统计量作为经验性的基础信息论量,能够有效捕捉条件依赖特征。我们表明所提检验方法在计算速度极快的同时,能很好控制第一类和第二类错误。最后,我们在合成数据和真实数据分析中验证了该检验的有效性。