Motivated by conditional independence testing, an essential step in constraint-based causal discovery algorithms, we study the nonparametric Von Mises estimator for the entropy of multivariate distributions built on a kernel density estimator. We establish an exponential concentration inequality for this estimator. We design a test for conditional independence (CI) based on our estimator, called VM-CI, which achieves optimal parametric rates under smoothness assumptions. Leveraging the exponential concentration, we prove a tight upper bound for the overall error of VM-CI. This, in turn, allows us to characterize the sample complexity of any constraint-based causal discovery algorithm that uses VM-CI for CI tests. To the best of our knowledge, this is the first sample complexity guarantee for causal discovery for continuous variables. Furthermore, we empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity (or both), which translates to a better performance in structure learning as well.
翻译:受约束基因果发现算法中关键步骤——条件独立性检验的启发,我们研究了基于核密度估计构建的多变量分布熵的非参数冯·米塞斯估计量。我们为该估计量建立了指数浓度不等式,并基于此设计了一种条件独立性检验方法(称为VM-CI),该方法在光滑性假设下达到了最优参数速率。借助指数浓度性质,我们证明了VM-CI总误差的紧致上界,进而能够刻画任何采用VM-CI进行条件独立性检验的约束基因果发现算法的样本复杂度。据我们所知,这是首次对连续变量因果发现给出样本复杂度保证。此外,实验表明VM-CI在时间或样本复杂度(或两者兼有)方面优于其他主流条件独立性检验方法,这同样转化为结构学习中的更优性能。