We describe a data-efficient, kernel-based approach to statistical testing of conditional independence. A major challenge of conditional independence testing, absent in tests of unconditional independence, is to obtain the correct test level (the specified upper bound on the rate of false positives), while still attaining competitive test power. Excess false positives arise due to bias in the test statistic, which is obtained using nonparametric kernel ridge regression. We propose three methods for bias control to correct the test level, based on data splitting, auxiliary data, and (where possible) simpler function classes. We show these combined strategies are effective both for synthetic and real-world data.
翻译:我们描述了一种基于核方法、数据高效的统计检验框架,用于检验条件独立性。条件独立性检验面临的一个核心挑战(在无条件独立性检验中不存在)是:在保持有竞争力的检验功效的同时,获得正确的检验水平(即对假阳性率的指定上限)。由于通过非参数核岭回归得到的检验统计量存在偏差,会导致过高的假阳性率。我们提出了三种控制偏差以修正检验水平的方法,分别基于数据分割、辅助数据以及(在可能的情况下)更简单的函数类。我们证明这些组合策略在合成数据和真实世界数据中均有效。