We propose a nonparametric approach to testing conditional independence and estimating conditional association, generalizing the Cochran-Mantel-Haenszel (CMH) test and odds-ratio estimator to continuous sample spaces. It leverages a multiscale scanning approach to decompose the sample space into a cascade of $2\times 2 \times T$ tables. Following the CMH test, we condition on the marginal order statistics, which are "almost ancillary" regarding conditional dependency. This strategy helps overcome a key challenge faced by other methods that discretize the sample space: we achieve consistency without requiring stratum sample sizes to grow to infinity, a constraint often difficult to satisfy in practice. Our method produces easy-to-compute test statistics with a known asymptotic null distribution under the conditional sampling model, scaling almost linearly with the sample size. Our simulation results demonstrate reliable Type I error control, even with small samples and high-dimensional conditioning, and competitive power compared to state-of-the-art tests. Finally, a case study on Uber ride-share data highlights the method's unique dual capability, inherited from the CMH, to both test and identify the nature of the inferred conditional association. By providing summary statistics that capture the strength and direction of local associations, our method offers practitioners a useful tool for learning conditional dependencies.
翻译:本文提出一种检验条件独立性并估计条件关联的非参数方法,将Cochran-Mantel-Haenszel(CMH)检验和比值比估计量推广至连续样本空间。该方法利用多尺度扫描策略将样本空间分解为一系列$2\times 2 \times T$列联表。遵循CMH检验的思路,我们以边际顺序统计量(在条件依赖性框架下具有“近乎辅助性”)为条件,这一策略有助于克服其他离散化样本空间方法面临的关键挑战:无需要求每层样本量趋于无穷大即可实现一致性,而该约束在实践中往往难以满足。本方法可生成易于计算的检验统计量,在条件抽样模型下具有已知的渐近零分布,且计算复杂度随样本量呈近似线性增长。模拟实验表明,即使在小样本和高维条件设定下,该方法仍能可靠控制第一类错误率,其检验效能与前沿方法相比具有竞争力。最后,基于Uber网约车数据的案例研究凸显了该方法继承自CMH的独特双重能力:既能检验条件关联的存在性,又能识别推断所得条件关联的性质。通过提供反映局部关联强度与方向的汇总统计量,本方法为实践者学习条件依赖性提供了实用工具。